Skip to content

linux/kernel/sched/rt.c

Imported from _research/manual-study-linux/file-notes/linux__kernel__sched__rt.c.md.

File Notes: kernel/sched/rt.c

Status: reviewed.

Purpose

Implements the real-time scheduling class for FIFO/RR style static-priority scheduling, priority-array queues, runtime throttling, group scheduling, push logic, and RT class callbacks.

Key Types And Functions

  • init_rt_rq(): initializes priority queues and bitmap state.
  • init_rt_bandwidth() / RT period timer: runtime budget machinery.
  • sched_rt_runtime_exceeded(): throttling check.
  • enqueue_task_rt() / dequeue_task_rt(): class queue operations.
  • pick_next_rt_entity() / pick_task_rt(): priority selection.
  • put_prev_task_rt() / set_next_task_rt(): switch hooks.
  • DEFINE_SCHED_CLASS(rt): RT class callback table.

Data Flow

RT runqueues use a priority array: one list per RT priority plus a bitmap for finding the first active priority. init_rt_rq() initializes the lists, clears the bitmap, sets a delimiter bit, resets highest-priority state, initializes pushable task tracking, and starts with no queued RT tasks.

Enqueue/dequeue operations manipulate sched_rt_entity membership in these priority lists. Group scheduling walks nested RT entities so parent groups reflect child activity. The top RT runqueue contributes its task count to the generic rq->nr_running only when it is queued and not throttled.

Runtime accounting uses per-group bandwidth settings. update_curr_rt() charges elapsed execution through common scheduler accounting, while sched_rt_runtime_exceeded() checks budget, balances runtime, marks throttled queues, and dequeues throttled queues from the top runqueue.

Selection is direct: pick_next_rt_entity() finds the first set priority bit and returns the first entity in that priority list; nested group queues are walked until a task entity is reached.

Invariants And Safety Contracts

  • A throttled RT runqueue must not be enqueued at the top level.
  • Priority bitmap and per-priority lists must remain consistent.
  • Group dequeue occurs top-down because parent priority depends on child entries.
  • RT runtime controls are a safety feature: unlimited RT can starve normal progress.

Rust Translation Guidance

Use a fixed priority-array abstraction with internal consistency checks between bitmap and lists. Runtime throttling should be modeled as a state on the queue, not as a side flag ignored by enqueue. Group scheduling needs explicit parent updates or a tree walk that cannot be skipped by callers.

AI-Native Systems Guidance

Privileged AI jobs that claim low latency should still have RT-like budget throttles. A system should make high-priority classes deterministic but bounded: priority alone cannot override the need for forward progress and recovery.

Evidence

  • init_rt_rq() initializes per-priority queues, bitmap delimiter, highest priority, pushable tasks, and throttling fields at kernel/sched/rt.c:68-95.
  • RT bandwidth is timer-backed through init_rt_bandwidth() at kernel/sched/rt.c:125-134.
  • sched_rt_runtime_exceeded() throttles over-budget RT queues and dequeues them at kernel/sched/rt.c:863-904.
  • update_curr_rt() skips non-RT current tasks and charges RT execution at kernel/sched/rt.c:970-990.
  • Top-level RT enqueue/dequeue updates rq->nr_running and avoids throttled queues at kernel/sched/rt.c:1010-1047.
  • RT entity enqueue/dequeue manipulates priority lists and group stacks at kernel/sched/rt.c:1331-1429; enqueue_task_rt() starts at kernel/sched/rt.c:1432-1445.
  • pick_next_rt_entity() uses the first bitmap bit and FIFO list head at kernel/sched/rt.c:1682-1698; pick_task_rt() wraps it at kernel/sched/rt.c:1715-1725.
  • RT switch and callback table logic appears at kernel/sched/rt.c:1727-1745 and kernel/sched/rt.c:2601-2637.