Skip to content

linux/kernel/sched/fair.c

Imported from _research/manual-study-linux/file-notes/linux__kernel__sched__fair.c.md.

File Notes: kernel/sched/fair.c

Status: reviewed.

Purpose

Implements the fair scheduling class, now centered on EEVDF selection: virtual runtime, virtual lag, virtual deadlines, augmented RB-tree selection, fair runtime accounting, group traversal, and fair-class callbacks.

Key Types And Functions

  • avg_vruntime(): computes weighted average virtual runtime.
  • entity_lag() / update_entity_lag(): compute and clamp lag.
  • vruntime_eligible(): eligibility test without lossy division.
  • pick_eevdf(): earliest eligible virtual deadline selection.
  • update_deadline() and update_curr(): runtime/deadline accounting.
  • put_prev_task_fair() / set_next_task_fair(): class switch hooks.
  • DEFINE_SCHED_CLASS(fair): fair class callback table.

Data Flow

The fair class tracks service through each scheduling entity’s virtual runtime. update_curr() charges elapsed execution to the current entity, advances vruntime by a fair-weighted delta, and checks whether the entity consumed its request and needs a new virtual deadline.

EEVDF selection first requires eligibility: the entity must be owed service, represented as non-negative lag. Among eligible entities, the task with the earliest virtual deadline wins. The implementation uses an augmented RB tree so selection can prune subtrees by minimum virtual runtime while entities are ordered by deadline.

Group scheduling is handled by walking scheduling entities upward through their hierarchy in put_prev_task_fair() and set_next_task_fair(). The core still sees the same class callback surface, while fair-specific hierarchy and bandwidth details stay inside fair.c.

Invariants And Safety Contracts

  • Virtual lag is clamped to bounded ranges to avoid runaway effects from enqueue/dequeue/reweight changes.
  • Eligibility is checked with weighted sums, not a naive divided average, to avoid precision loss.
  • Runtime accounting and deadline updates occur when current execution is charged, not as detached bookkeeping.
  • Fair class methods are registered through DEFINE_SCHED_CLASS(fair) and called under scheduler-core lock contracts.

Rust Translation Guidance

Represent fair scheduling as an algorithm module over a runqueue-owned tree. Keep virtual runtime, lag, deadline, and slice fields together on a scheduling entity type. If using Rust, the augmented tree must be a deliberate intrusive or arena-backed structure so scheduler entities can move without invalidating references.

AI-Native Systems Guidance

An AI job scheduler can adapt EEVDF concepts: each agent job accrues service, lag determines whether it is owed compute, and a virtual deadline lets latency-sensitive jobs ask for shorter slices without abandoning fairness.

Evidence

  • avg_vruntime() computes weighted average virtual runtime at kernel/sched/fair.c:768-814.
  • entity_lag() clamps lag against a slice-derived limit at kernel/sched/fair.c:818-841; update_entity_lag() handles delayed dequeue constraints at kernel/sched/fair.c:843-875.
  • vruntime_eligible() avoids precision loss from direct average division at kernel/sched/fair.c:877-915.
  • The EEVDF comment states the two criteria and augmented RB-tree strategy at kernel/sched/fair.c:1117-1135.
  • update_deadline() computes deadline = vruntime + slice / weight and requests reschedule after consumption at kernel/sched/fair.c:1234-1260.
  • update_curr() charges execution, advances vruntime, and refreshes deadlines at kernel/sched/fair.c:1982-2007.
  • Previous and next fair task hooks walk group entities at kernel/sched/fair.c:9972-10005 and kernel/sched/fair.c:15027-15065.
  • The fair class callback table is defined at kernel/sched/fair.c:15352-15400.