Dev/mip 530/post algo execution cleanup persistency
Created by: apmariglis
When the Controller layer receives a request to execute an algorithm, just before starting the execution, the context_id of the execution along with the node_ids that will participate in the algorithm execution are written on a file (mipengine/controller/controller.py:line 71
) as a means of persistency in the case of the Controller layer being restarted.
The file is defined in the mipengine/controller/config.toml
under cleanup
.contextids_cleanup_file
The entries in the file are of the following format:
[6524532] #context_id
nodes = [ "testglobalnode", "testlocalnode1", "testlocalnode2",]
timestamp = "2022-03-24T15:58:31.724740+00:00"
released = false
The Cleaner class (mipengine/controller/cleaner.py
) reads this file (every nodes_cleanup_interval
seconds defined in mipengine/controller/config.toml
under cleanup
) and calls the "cleanup" task with the respective context_id on the relevant nodes if either one of the following 2 criteria is met. Either, the released
flag is true or as many seconds as defined in mipengine/controller/config.toml
under cleanup
.contextid_release_timelimit
have passed from the timestamp
.
The released
flag is set to true (by the mipengine/controller/controller.py:line 91
) when an algorithm execution finishes, successfully or not.
Note: In order to minimize the possibility of the file getting corrupted, ex. if the process that alters it goes down, the changes to the file are always written to a temporary file and then the temporary file is renamed to the original file. The renaming action is guarantied to be atomic (on linux) so the possibility of the file becoming corrupted is fairly small.