3.1.7. Nbor list dead-lock¶
A deadlock could occur if a task A depends on another task B being updated,
because B is in the nbor list of A and the logical flag needed is true
for the B entry in A’s nbor list, while at the same time A is not in the nbor
list of B, and hence B will not include A when doing check_nbors(). This could
cause a deadlock because the mechanism that puts task A in the ready queue is
that a thread working on B, updating the time past the one that task A is
waiting for, then calls check_ready() on the A-task.
But if the reason that the A task is not on the nbor list of B is just a time
delay (e.g. because of latency in MPI communication), then one just neeeds to
make sure that the check_ready() call actually happens when A – after the
delay – gets put on the nbor list of B. This will be ensured if the adding
of a task to an nbor list always is accompanied by a check_ready() call.
One call too many does not hurt.
So, at guarantee against nbor-list caused deadlocking is to add a
check_nbors() call into the init_nbors() call. This will work, as long
as init_nbors() is the method used to update the nbor lists from
task_mesg_t%unpack(). If the method is changed the new method needs to
do something similar.