Origin/dev/mip 249 stop algo execution if node stops responding
Created by: apmariglis
There are roughly 2 types of exceptions that can be raised when a (celery)task takes more time to finish than a defined time limit. The first type is controlled by a time limit set on the executing end, aka the node and the second type is controlled by a time limit set on the calling end, aka the controller/algorithm executor.
The first type, (controlled by the executing end, the node) are of SoftTimeLimitExceeded
or TimeLimitExceeded
type.
A SoftTimeLimitExceeded
is raised when the task has not finished executing after the time limit defined as celery.conf.task_soft_time_limit
(in mipengine/node/node.py
). When this time limit is exceeded a SoftTimeLimitExceeded
will be raised (and propagated to the caller who queued the task) but the worker running the task will not terminate the task execution, the task execution will continue to execute, it kind of serves as a warning.
A TimeLimitExceeded
is raised when the task has not finished executing after the time limit defined as celery.conf.task_time_limit
(in mipengine/node/node.py
). When this time limit is exceeded, a TimeLimitExceeded
will be raised (and propagated to the caller who queued the task) and the worker running the task will terminate the task execution.
The second type, (controlled by the caller end, controller/algorithm executor) is of TimeoutError
type. It is raised when the code is blocked on a call to the get
method waiting for the result of the task.
So in general, the way this works is that when the get()
method is called without arguments a SoftTimeLimit
or TimeLimitExceeded
can be raised and if the get(timeout)
method is called with a timeout
argument then it is possible for a TimeoutError
to be raised as well.