next up previous contents
Next: Performance Up: Non-blocking Synchronization Primitives Previous: Software Implementation of

Hardware Contention Control

 

As a further extension, a processor can provide a conditional load instruction or Cload. The Cload instruction is a load instruction that succeeds only if the location being loaded does not have an advisory lock set on it, setting the advisory lock when it does succeed.

With Cload available, the version number is loaded initially using Cload rather than a normal load. If the Cload operation fails, the thread waits and retries, up to some maximum, and then uses the normal load instruction and proceeds. This waiting avoids performing the update concurrently with another process updating the same data structure. It also prevents potential starvation when one operation takes significantly longer than other operations, causing these other frequently occuring operations to perpetually abort the former. It appears particularly beneficial in large-scale shared memory systems where the time to complete a DCAS-governed operation can be significantly extended by wait times on memory because of contention, increasing the exposure time for another process to perform an interfering operation. Memory references that miss can take 100 times as long, or more, because of contention misses. Without Cload, a process can significantly delay the execution of another process by faulting in the data being used by the other process and possibly causing its DCAS to fail as well.

The cost of using Cload in the common case is simply testing whether the Cload succeeded, given that a load of the version number is required in any case.

Cload can be implemented using the cache-based advisory locking mechanism implemented in ParaDiGM [8]. Briefly, the processor advises the cache controller that a particular cache line is ``locked''. Normal loads and stores ignore the lock bit, but the Cload instruction tests and sets the cache-level lock for a given cache line or else fails if it is already set. A store operation clears the bit. This implementation costs an extra 3 bits of cache tags per cache line plus some logic in the cache controller. Judging by our experience with ParaDiGM, Cload is quite feasible to implement.



next up previous contents
Next: Performance Up: Non-blocking Synchronization Primitives Previous: Software Implementation of



Michael Greenwald
Thu Sep 19 12:18:20 PDT 1996