linux/kernel/sched/fair.c
Imported from
_research/manual-study-linux/file-notes/linux__kernel__sched__fair.c.md.
File Notes: kernel/sched/fair.c
Status: reviewed.
Purpose
Implements the fair scheduling class, now centered on EEVDF selection: virtual runtime, virtual lag, virtual deadlines, augmented RB-tree selection, fair runtime accounting, group traversal, and fair-class callbacks.
Key Types And Functions
avg_vruntime(): computes weighted average virtual runtime.entity_lag()/update_entity_lag(): compute and clamp lag.vruntime_eligible(): eligibility test without lossy division.pick_eevdf(): earliest eligible virtual deadline selection.update_deadline()andupdate_curr(): runtime/deadline accounting.put_prev_task_fair()/set_next_task_fair(): class switch hooks.DEFINE_SCHED_CLASS(fair): fair class callback table.
Data Flow
The fair class tracks service through each scheduling entity’s virtual runtime.
update_curr() charges elapsed execution to the current entity, advances
vruntime by a fair-weighted delta, and checks whether the entity consumed its
request and needs a new virtual deadline.
EEVDF selection first requires eligibility: the entity must be owed service, represented as non-negative lag. Among eligible entities, the task with the earliest virtual deadline wins. The implementation uses an augmented RB tree so selection can prune subtrees by minimum virtual runtime while entities are ordered by deadline.
Group scheduling is handled by walking scheduling entities upward through their
hierarchy in put_prev_task_fair() and set_next_task_fair(). The core still
sees the same class callback surface, while fair-specific hierarchy and
bandwidth details stay inside fair.c.
Invariants And Safety Contracts
- Virtual lag is clamped to bounded ranges to avoid runaway effects from enqueue/dequeue/reweight changes.
- Eligibility is checked with weighted sums, not a naive divided average, to avoid precision loss.
- Runtime accounting and deadline updates occur when current execution is charged, not as detached bookkeeping.
- Fair class methods are registered through
DEFINE_SCHED_CLASS(fair)and called under scheduler-core lock contracts.
Rust Translation Guidance
Represent fair scheduling as an algorithm module over a runqueue-owned tree. Keep virtual runtime, lag, deadline, and slice fields together on a scheduling entity type. If using Rust, the augmented tree must be a deliberate intrusive or arena-backed structure so scheduler entities can move without invalidating references.
AI-Native Systems Guidance
An AI job scheduler can adapt EEVDF concepts: each agent job accrues service, lag determines whether it is owed compute, and a virtual deadline lets latency-sensitive jobs ask for shorter slices without abandoning fairness.
Evidence
avg_vruntime()computes weighted average virtual runtime atkernel/sched/fair.c:768-814.entity_lag()clamps lag against a slice-derived limit atkernel/sched/fair.c:818-841;update_entity_lag()handles delayed dequeue constraints atkernel/sched/fair.c:843-875.vruntime_eligible()avoids precision loss from direct average division atkernel/sched/fair.c:877-915.- The EEVDF comment states the two criteria and augmented RB-tree strategy at
kernel/sched/fair.c:1117-1135. update_deadline()computesdeadline = vruntime + slice / weightand requests reschedule after consumption atkernel/sched/fair.c:1234-1260.update_curr()charges execution, advancesvruntime, and refreshes deadlines atkernel/sched/fair.c:1982-2007.- Previous and next fair task hooks walk group entities at
kernel/sched/fair.c:9972-10005andkernel/sched/fair.c:15027-15065. - The fair class callback table is defined at
kernel/sched/fair.c:15352-15400.