Our approach assumes an efficient implementation of DCAS functionality. In this section, we briefly outline an instruction set extension to the load-linked/ store-conditional) instructions to support DCAS. (A software implementation is discussed in Section 6.1.) With a processor supporting load-linked (LL) and store-conditional (SC) instructions, add two instructions:
DCAS is then implemented by the instruction sequence shown in Figure 3 (using R4000 instructions in addition to the LL/SC(P) instructions).
Figure 3: DCAS Implementation using LL/ SC and LLP/ SCP . Success or failure of SC (and thus of the DCAS operation) is returned in U1 or whatever general register holds the argument to SC. 1 denotes success, 0 failure. If the next instruction tries to read U1, the hardware interlocks (as it already does for LL/SC) if the result of SC is not already in U1.
The LL and LLP instructions in lines 1 and 2 ``link'' the loads with the respective stores issued by the following SC and SCP instructions. Lines 3 and 4 verify that (T0) and (T1) contain V0 and V1, respectively. The SCP and SC in lines 5 and 6 are conditional. They will not issue the stores unless (T0) and (T1) have been unchanged since lines 1 and 2. This guarantees that the results of CAS in lines 3 and 4 are still valid at line 6, or else the SC fails. Further, the store issued by a successful SCP is buffered pending a successful SC. Thus, SC in line 6 writes U1 and U0 to (T1) and (T0) atomically with the comparison to V0 and V1.
We have worked out a detailed design for the implementation of these two instructions in a RISC processor such as the R4000 but the description is omitted for brevity.