Skip to content

Dev/mip 530/post algo execution cleanup persistency

Created by: apmariglis

When the Controller layer receives a request to execute an algorithm, just before starting the execution, the context_id of the execution along with the node_ids that will participate in the algorithm execution are written on a file (mipengine/controller/controller.py:line 71) as a means of persistency in the case of the Controller layer being restarted. The file is defined in the mipengine/controller/config.toml under cleanup.contextids_cleanup_file

The entries in the file are of the following format:

[6524532] #context_id
nodes = [ "testglobalnode", "testlocalnode1", "testlocalnode2",]
timestamp = "2022-03-24T15:58:31.724740+00:00"
released = false

The Cleaner class (mipengine/controller/cleaner.py) reads this file (every nodes_cleanup_interval seconds defined in mipengine/controller/config.toml under cleanup) and calls the "cleanup" task with the respective context_id on the relevant nodes if either one of the following 2 criteria is met. Either, the released flag is true or as many seconds as defined in mipengine/controller/config.toml under cleanup.contextid_release_timelimit have passed from the timestamp. The released flag is set to true (by the mipengine/controller/controller.py:line 91) when an algorithm execution finishes, successfully or not.

Note: In order to minimize the possibility of the file getting corrupted, ex. if the process that alters it goes down, the changes to the file are always written to a temporary file and then the temporary file is renamed to the original file. The renaming action is guarantied to be atomic (on linux) so the possibility of the file becoming corrupted is fairly small.

Merge request reports