Apple Brings Low-level Atomic Operations To Swi...
Atomic operations are a simple form of synchronization that work on simple data types. The advantage of atomic operations is that they do not block competing threads. For simple operations, such as incrementing a counter variable, this can lead to much better performance than taking a lock.
Apple brings low-level atomic operations to Swi...
OS X and iOS include numerous operations to perform basic mathematical and logical operations on 32-bit and 64-bit values. Among these operations are atomic versions of the compare-and-swap, test-and-set, and test-and-clear operations. For a list of supported atomic operations, see the /usr/include/libkern/OSAtomic.h header file or see the atomic man page.
Synchronization helps ensure the correctness of your code, but does so at the expense of performance. The use of synchronization tools introduces delays, even in uncontested cases. Locks and atomic operations generally involve the use of memory barriers and kernel-level synchronization to ensure code is properly protected. And if there is contention for a lock, your threads could block and experience even greater delays.
Table 4-2 lists some of the approximate costs associated with mutexes and atomic operations in the uncontested case. These measurements represented average times taken over several thousand samples. As with thread creation times though, mutex acquisition times (even in the uncontested case) can vary greatly depending on processor load, the speed of the computer, and the amount of available system and program memory.
Nonblocking synchronization is a way to perform some types of operations and avoid the expense of locks. Although locks are an effective way to synchronize two threads, acquiring a lock is a relatively expensive operation, even in the uncontested case. By contrast, many atomic operations take a fraction of the time to complete and can be just as effective as a lock.
Atomic operations let you perform simple mathematical and logical operations on 32-bit or 64-bit values. These operations rely on special hardware instructions (and an optional memory barrier) to ensure that the given operation completes before the affected memory is accessed again. In the multithreaded case, you should always use the atomic operations that incorporate a memory barrier to ensure that the memory is synchronized correctly between threads.
Table 4-3 lists the available atomic mathematical and logical operations and the corresponding function names. These functions are all declared in the /usr/include/libkern/OSAtomic.h header file, where you can also find the complete syntax. The 64-bit versions of these functions are available only in 64-bit processes.
The behavior of most atomic functions should be relatively straightforward and what you would expect. Listing 4-1, however, shows the behavior of atomic test-and-set and compare-and-swap operations, which are a little more complex. The first three calls to the OSAtomicTestAndSet function demonstrate how the bit manipulation formula being used on an integer value and its results might differ from what you would expect. The last two calls show the behavior of the OSAtomicCompareAndSwap32 function. In all cases, these functions are being called in the uncontested case when no other threads are manipulating the values.
In addition to locks and semaphores, certain low-level synchronization primitives like test and set are also available, along with a number of other atomic operations. These additional operations are described in libkern/gen/OSAtomicOperations.c in the kernel sources. Such atomic operations may be helpful if you do not need something as robust as a full-fledged lock or semaphore. Since they are not general synchronization mechanisms, however, they are beyond the scope of this chapter.
Every (defined) read operation (load instructions, memcpy, atomicloads/read-modify-writes, etc.) R reads a series of bytes written by(defined) write operations (store instructions, atomicstores/read-modify-writes, memcpy, etc.). For the purposes of thissection, initialized globals are considered to have a write of theinitializer which is atomic and happens before any other read or writeof the memory in question. For each byte of a read R, Rbytemay see any write to the same byte, except:
If an atomic operation is marked syncscope(""), where is a target specific synchronization scope, then it is targetdependent if it synchronizes with and participates in the seq_cst totalorderings of other operations.
Otherwise, an atomic operation that is not marked syncscope("singlethread")or syncscope("") synchronizes with and participates in theseq_cst total orderings of other operations that are not markedsyncscope("singlethread") or syncscope("").
A fence A which has (at least) release ordering semanticssynchronizes with a fence B with (at least) acquire orderingsemantics if and only if there exist atomic operations X and Y, bothoperating on some atomic object M, such that A is sequenced before X, Xmodifies M (either directly or through some side effect of a sequenceheaded by X), Y is sequenced before B, and Y observes M. This provides ahappens-before dependency between A and B. Rather than an explicitfence, one (but not both) of the atomic operations X or Y mightprovide a release or acquire (resp.) ordering constraint andstill synchronize-with the explicit fence and establish thehappens-before edge.
The success and failure ordering arguments specify how thiscmpxchg synchronizes with other atomic operations. Both ordering parametersmust be at least monotonic, the failure ordering cannot be eitherrelease or acq_rel.
Interestingly, update coalescing also eliminates the need to employ atomic operations during pointer updates in a concurrent setting, this solving reference counting issues in a concurrent setting. Therefore, update coalescing solves the third problem of naive reference counting (i.e., a costly overhead in a concurrent setting). Levanoni and Petrank presented an enhanced algorithm that may run concurrently with multithreaded applications employing only fine synchronization.[7]
Bacon describes a cycle-collection algorithm for reference counting with similarities to tracing collectors, including the same theoretical time bounds. It is based on the observation that a cycle can only be isolated when a reference count is decremented to a nonzero value. All objects which this occurs on are put on a roots list, and then periodically the program searches through the objects reachable from the roots for cycles. It knows it has found a cycle that can be collected when decrementing all the reference counts on a cycle of references brings them all down to zero.[10] An enhanced version of this algorithm by Paz et al.[11] is able to run concurrently with other operations and improve its efficiency by using the update coalescing method of Levanoni and Petrank.[5][6]
The GObject object-oriented programming framework implements reference counting on its base types, including weak references. Reference incrementing and decrementing uses atomic operations for thread safety. A significant amount of the work in writing bindings to GObject from high-level languages lies in adapting GObject reference counting to work with the language's own memory management system.
Interrupts may be delivered to the CPU while a task is executing; thattoo may cause unexpected concurrent access to per-CPU data.To prevent this problem, the developer can disable interruptdelivery with local_irq_disable() and then enable it withlocal_irq_enable(). If the code is running in a context wherethe interrupts might be already disabled, they should uselocal_irq_save() and local_irq_restore(); thisvariant saves and restores the previous status in addition todisabling or enabling the interrupts. It is worthnoting that disabling interrupts also disables preemption.While interrupts are disabled, the code is running in atomic context and thedevelopers need to be careful to avoid, among other things, any operationsthat may sleep or call into the scheduler. 041b061a72