Skip to content

Origin/dev/mip 249 stop algo execution if node stops responding

Kostas FILIPPOPOLITIS requested to merge dev/MIP-249_task_timeouts into master

Created by: apmariglis

There are roughly 2 types of exceptions that can be raised when a (celery)task takes more time to finish than a defined time limit. The first type is controlled by a time limit set on the executing end, aka the node and the second type is controlled by a time limit set on the calling end, aka the controller/algorithm executor.

The first type, (controlled by the executing end, the node) are of SoftTimeLimitExceeded or TimeLimitExceeded type. A SoftTimeLimitExceeded is raised when the task has not finished executing after the time limit defined as celery.conf.task_soft_time_limit (in mipengine/node/node.py). When this time limit is exceeded a SoftTimeLimitExceeded will be raised (and propagated to the caller who queued the task) but the worker running the task will not terminate the task execution, the task execution will continue to execute, it kind of serves as a warning. A TimeLimitExceeded is raised when the task has not finished executing after the time limit defined as celery.conf.task_time_limit (in mipengine/node/node.py). When this time limit is exceeded, a TimeLimitExceeded will be raised (and propagated to the caller who queued the task) and the worker running the task will terminate the task execution.

The second type, (controlled by the caller end, controller/algorithm executor) is of TimeoutError type. It is raised when the code is blocked on a call to the get method waiting for the result of the task.

So in general, the way this works is that when the get() method is called without arguments a SoftTimeLimit or TimeLimitExceeded can be raised and if the get(timeout) method is called with a timeout argument then it is possible for a TimeoutError to be raised as well.

Merge request reports