Fix/cleaner fixes
Created by: apmariglis
Main points:
-
cleaner.py
Module logic is changed#cleanup entry example context_id= "3502300" node_ids = [ "testglobalnode", "testlocalnode1", "testlocalnode2",] timestamp = "2022-05-23T14:40:34.203085+00:00" released = false
How it works: Just before an algorithm starts executing,
Cleaner::add_contextid_for_cleanup(context_id)
is called (from the Controller). This creates a new file(ex.cleanup_3502300.toml
) containing a cleanup entry as the above. As soon as the algorithm execution finishes (either success or fail),Cleaner::release_context_id(context_id)
is called (from the Controller), which sets thereleased
flag, of the respective cleanup entry, totrue
. When theCleaner
object is started (methodstart
), it constantly loops through all the entries, finds the ones that either have theirreleased
flag set totrue
or theirtimestamp
has expired (checkcleaner.py::_is_timestamp_expired
function) and processes them by queuing the cleanup task on the respective nodes and with the respectivecontext_id
. When the cleanup tasks on all the nodes of an entry are successful, the entry file is deleted. Otherwise thenode_ids
list of the entry is updated to contain only the failednode_ids
and will be re-processed in the next iteration of the loop. -
algorithm_execution_tasks_handler.py
The execution of thecleanup
task is now split in 2 methodsqueue_cleanup
andwait_queued_cleanup_complete
. Making it more efficient since the non-blockingqueue_cleanup
queues the cleanup task on the node and the blockingwait_queued_cleanup_complete
can be called separately to check for the success of the task