We intend to build a lock-free, priority-aware work-stealing deque for dynamic task scheduling in parallel computing systems. Each worker thread maintains a priority deque and pushes/pops tasks locally, while stealing tasks from others based on priority. This is achieved using atomic operations (compare_exchange
, fetch_add
) in OpenMP and optionally MPI. We aim to benchmark the implementation on shared-memory (8–128 cores) and multi-node systems using MPI, testing throughput, steal attempts, and priority correctness.
Work stealing allows idle threads to dynamically balance load by stealing tasks from others. Traditional deques are extended here with priority awareness. Atomic operations replace locks to avoid contention. This system supports fine-grained concurrency, priority-driven task selection, and hybrid OpenMP+MPI environments for full scalability. Benchmarking includes throughput, steal success rate, and accuracy of priority execution.
while not done:
task = pop_local_highest_priority_task()
if task != NULL:
execute(task)
else:
victim = random_other_thread()
task = steal_lowest_priority_task_from(victim)
if task != NULL:
execute(task)
else:
spin_wait()
Each thread handles 1000 tasks with mixed priorities. After executing k
tasks, it may steal from others. We compare two approaches:
Stealing with priorities can cause inversion if not carefully controlled. We explore ways to balance fairness, correctness, and efficiency under saturation and priority skew.
Our implementation will be in C/C++ and optionally MPI (distributed memory). We refer to literature on lock-free data structures and atomics, and consult documentation on std::atomic
, __sync_*
, and __atomic_*
intrinsics. Testing is done on multi-core machines and GHC clusters.
compare_exchange
, fetch_add
) to avoid synchronization locks across threads in stealing tasks from the deque.n
deques per processor, each storing tasks tagged with its priority. Operations should be lock-free, and task stealing should choose the best priority task available in that processor.n
tasks at once, analyzing time taken, synchronization overhead, communication time, and other relevant metrics.n
tasks, attempts to steal higher-priority tasks from random processors. Evaluate whether this improves high-priority task completion without harming low-priority tasks.x86-64 multi-core architecture C/C++ for implementation.
n
deques in a lock-free way with each deque having tasks of different priority.n
tasks, tries to steal higher-priority tasks from a random processor’s queues.