Skip to content

Fix/cleaner fixes

Kostas FILIPPOPOLITIS requested to merge fix/cleaner_fixes into master

Created by: apmariglis

Main points:

  1. cleaner.py Module logic is changed

       #cleanup entry example 
       context_id= "3502300"
       node_ids = [ "testglobalnode", "testlocalnode1", "testlocalnode2",]
       timestamp = "2022-05-23T14:40:34.203085+00:00"
       released = false

    How it works: Just before an algorithm starts executing, Cleaner::add_contextid_for_cleanup(context_id) is called (from the Controller). This creates a new file(ex. cleanup_3502300.toml) containing a cleanup entry as the above. As soon as the algorithm execution finishes (either success or fail), Cleaner::release_context_id(context_id) is called (from the Controller), which sets the released flag, of the respective cleanup entry, to true. When the Cleaner object is started (method start), it constantly loops through all the entries, finds the ones that either have their released flag set to true or their timestamp has expired (check cleaner.py::_is_timestamp_expired function) and processes them by queuing the cleanup task on the respective nodes and with the respective context_id. When the cleanup tasks on all the nodes of an entry are successful, the entry file is deleted. Otherwise the node_ids list of the entry is updated to contain only the failed node_ids and will be re-processed in the next iteration of the loop.

  2. algorithm_execution_tasks_handler.py The execution of the cleanup task is now split in 2 methods queue_cleanup and wait_queued_cleanup_complete. Making it more efficient since the non-blocking queue_cleanup queues the cleanup task on the node and the blocking wait_queued_cleanup_complete can be called separately to check for the success of the task

Merge request reports