lib/rcuref.c

Source file repositories/reference/linux-study-clean/lib/rcuref.c

File Facts

System
Linux kernel
Corpus path
lib/rcuref.c
Extension
.c
Size
9770 bytes
Lines
281
Domain
Kernel Services
Bucket
lib
Inferred role
Kernel Services: exported/initcall integration point
Status
integration implementation candidate

Why This File Exists

Shared kernel service surface used by multiple subsystems, including helpers, cryptography, virtualization support, and async I/O infrastructure.

Dependency Surface

Detected Declarations

Annotated Snippet

*	if (!atomic_dec_return(&->refcnt)) {
 *		remove_ptr(p);
 *		kfree_rcu((p, rcu);
 *	}
 *
 * atomic_inc_not_zero() is implemented with a try_cmpxchg() loop which has
 * O(N^2) behaviour under contention with N concurrent operations.
 *
 * rcuref uses atomic_add_negative_relaxed() for the fast path, which scales
 * better under contention.
 *
 * Why not refcount?
 * =================
 *
 * In principle it should be possible to make refcount use the rcuref
 * scheme, but the destruction race described below cannot be prevented
 * unless the protected object is RCU managed.
 *
 * Theory of operation
 * ===================
 *
 * rcuref uses an unsigned integer reference counter. As long as the
 * counter value is greater than or equal to RCUREF_ONEREF and not larger
 * than RCUREF_MAXREF the reference is alive:
 *
 * ONEREF   MAXREF               SATURATED             RELEASED      DEAD    NOREF
 * 0        0x7FFFFFFF 0x8000000 0xA0000000 0xBFFFFFFF 0xC0000000 0xE0000000 0xFFFFFFFF
 * <---valid --------> <-------saturation zone-------> <-----dead zone----->
 *
 * The get() and put() operations do unconditional increments and
 * decrements. The result is checked after the operation. This optimizes
 * for the fast path.
 *
 * If the reference count is saturated or dead, then the increments and
 * decrements are not harmful as the reference count still stays in the
 * respective zones and is always set back to STATURATED resp. DEAD. The
 * zones have room for 2^28 racing operations in each direction, which
 * makes it practically impossible to escape the zones.
 *
 * Once the last reference is dropped the reference count becomes
 * RCUREF_NOREF which forces rcuref_put() into the slowpath operation. The
 * slowpath then tries to set the reference count from RCUREF_NOREF to
 * RCUREF_DEAD via a cmpxchg(). This opens a small window where a
 * concurrent rcuref_get() can acquire the reference count and bring it
 * back to RCUREF_ONEREF or even drop the reference again and mark it DEAD.
 *
 * If the cmpxchg() succeeds then a concurrent rcuref_get() will result in
 * DEAD + 1, which is inside the dead zone. If that happens the reference
 * count is put back to DEAD.
 *
 * The actual race is possible due to the unconditional increment and
 * decrements in rcuref_get() and rcuref_put():
 *
 *	T1				T2
 *	get()				put()
 *					if (atomic_add_negative(-1, &ref->refcnt))
 *		succeeds->			atomic_cmpxchg(&ref->refcnt, NOREF, DEAD);
 *
 *	atomic_add_negative(1, &ref->refcnt);	<- Elevates refcount to DEAD + 1
 *
 * As the result of T1's add is negative, the get() goes into the slow path
 * and observes refcnt being in the dead zone which makes the operation fail.
 *
 * Possible critical states:
 *
 *	Context Counter	References	Operation
 *	T1	0	1		init()
 *	T2	1	2		get()
 *	T1	0	1		put()
 *	T2     -1	0		put() tries to mark dead
 *	T1	0	1		get()
 *	T2	0	1		put() mark dead fails
 *	T1     -1	0		put() tries to mark dead
 *	T1    DEAD	0		put() mark dead succeeds
 *	T2    DEAD+1	0		get() fails and puts it back to DEAD
 *
 * Of course there are more complex scenarios, but the above illustrates
 * the working principle. The rest is left to the imagination of the
 * reader.
 *
 * Deconstruction race
 * ===================
 *
 * The release operation must be protected by prohibiting a grace period in
 * order to prevent a possible use after free:
 *
 *	T1				T2
 *	put()				get()
 *	// ref->refcnt = ONEREF
 *	if (!atomic_add_negative(-1, &ref->refcnt))

Annotation

Implementation Notes