linux/kernel/sched/core.c
Imported from
_research/manual-study-linux/file-notes/linux__kernel__sched__core.c.md.
File Notes: kernel/sched/core.c
Status: reviewed.
Purpose
Implements the scheduler core: per-CPU runqueue storage, task/runqueue locking,
fork-time scheduler initialization, wakeup, task selection, context switching,
and the public schedule() entrypoint.
Key Types And Functions
runqueues: per-CPUstruct rqstorage.___task_rq_lock()/_task_rq_lock(): task-to-runqueue locking protocol.sched_fork(),sched_cgroup_fork(),sched_post_fork(): task creation scheduler phases.wake_up_new_task(): first publication to runnable state.pick_next_task(): core pick path, with core-scheduling support when enabled.__schedule()andschedule(): main scheduling loop.context_switch()andfinish_task_switch(): memory/register switch and post-switch cleanup.
Data Flow
Each CPU owns a runqueue in DEFINE_PER_CPU_SHARED_ALIGNED(struct rq, runqueues). Task operations lock the task’s pi_lock, find its current
runqueue, acquire the runqueue lock, then verify the task did not migrate while
the lock was being acquired. If it did, the code releases and retries.
Fork setup marks the task TASK_NEW, resets inherited scheduling policy where
needed, chooses a scheduler class, initializes runtime accounting, and keeps
the child off CPU. Cgroup fork setup assigns the scheduler task group and CPU,
then calls class-specific task_fork. wake_up_new_task() changes the state
to TASK_RUNNING, selects a CPU, locks the runqueue, activates the task,
traces the wakeup, and invokes class preemption logic.
schedule() calls __schedule_loop(), which disables preemption and invokes
__schedule() until rescheduling is no longer needed. __schedule() locks the
runqueue, handles blocking/deactivation, calls pick_next_task(), publishes
rq->curr, traces the switch, and enters context_switch() when prev != next.
context_switch() runs pre-switch hooks, switches the active memory map,
handles membarrier/rseq obligations, prepares lock transfer, and calls
switch_to() for the architecture register/stack switch. The returning task
then completes cleanup through finish_task_switch().
Invariants And Safety Contracts
TASK_NEWprevents a newly forked task from being run or externally woken before scheduler initialization finishes.- Task/runqueue locking retries if a task migrates during lock acquisition.
__schedule()must be called with preemption disabled.- The runqueue lock and memory barriers around
rq->currare part of the user/kernel memory-ordering contract, not just scheduler-local details. - Dead tasks drop their final task reference only after the last context switch away from them.
Rust Translation Guidance
Build task creation as a phase-typed pipeline: allocated task, scheduler initialized task, cgroup-attached task, published task, runnable task. Use RAII guards for task and runqueue locks, with retry semantics for migration. Keep the architecture context switch as a very small unsafe boundary; surround it with safe accounting, memory-map, and lifecycle code.
AI-Native Systems Guidance
Agent job schedulers should copy the lifecycle shape, not the code: validate and initialize jobs before publication, use explicit runnable-state transitions, emit tracepoints around wakeup/switch, and make the context switch between jobs a policy-observable event.
Evidence
- Per-CPU runqueues are defined at
kernel/sched/core.c:131-132. - Task/runqueue locking retries on task migration at
kernel/sched/core.c:732-749and explains acquire/release ordering atkernel/sched/core.c:759-775. ttwu_runnable()serializes againstschedule()and either restores a queued task to running state or falls back to full wakeup atkernel/sched/core.c:3857-3888.sched_fork()marks the childTASK_NEW, resets inherited policy, selects the class, and initializes scheduling state atkernel/sched/core.c:4803-4871.sched_cgroup_fork()attaches task group and initial CPU atkernel/sched/core.c:4874-4901.wake_up_new_task()publishes the first runnable state and enqueues the task atkernel/sched/core.c:4934-4965.prepare_task_switch(),finish_task_switch(), andschedule_tail()define switch pairing and first-run behavior atkernel/sched/core.c:5278-5439.context_switch()handles memory-map and register/stack switching atkernel/sched/core.c:5441-5505.pick_next_task()delegates to__pick_next_task()unless core scheduling is enabled atkernel/sched/core.c:6210-6265andkernel/sched/core.c:6664-6669.__schedule()locks the runqueue, choosesnext, publishesrq->curr, and callscontext_switch()atkernel/sched/core.c:7055-7236;schedule()wraps it atkernel/sched/core.c:7312-7325.