Massalin and Pu  describe the lock-free (non-blocking) implementation of the Synthesis V.1 multiprocessor kernel, using just CAS and DCAS, the same as our work. Their work supports our contention that DCAS is sufficient for the practical implementation of large systems using non-blocking synchronization. However, their work focused on using a small number of wait-free and lock-free data structures inside their operating system kernel. One reason their work has not been further emulated is their exploitation of application-specific optimizations to implement data structures. One example is their implementation of linked list with insertion and deletion from the middle of the list: it is efficient only because the usage within the Synthesis kernel is highly constrained and a single bit suffices where a reference count is normally needed. In contrast, our implementation of linked lists is general, and is usable by arbitrary application code.