Skip to content

Architecture Map

Imported from _research/manual-study-linux/architecture-map.md.

Linux Architecture Map

This document accumulates confirmed architecture facts from reviewed source notes. Do not add speculative design ideas here; put Rust translation in rust-translation.md and AI-native interpretation in ai-native-systems.md.

Confirmed System Shape

Rust Kernel API Membrane

The in-kernel Rust surface is intentionally centralized in the kernel crate. Rust modules depend on this crate, and missing C APIs are expected to be wrapped there before use. The crate is no_std, checks for CONFIG_RUST, and feature-gates subsystem wrappers at the root module boundary.

Evidence: file-notes/linux__rust__kernel__lib.rs.md.

Refcounted And Pinned Ownership

Linux’s Rust ownership layer does not treat Arc as a generic userspace utility. Kernel Arc is backed by kernel Refcount, has no weak references, saturates counts, pins data implicitly, and uses companion types for borrowed and unique states. ArcBorrow avoids extra indirection where a refcount-capable borrow is enough; UniqueArc models exclusive pre-publication ownership.

Evidence: file-notes/linux__rust__kernel__sync__arc.rs.md.

Current Context Versus Durable Task Handle

The Rust task wrapper splits current-context access from long-lived task handles. CurrentTask is scoped to the active task context and made non-thread-safe; durable task references use ARef<Task> and the C get_task_struct / put_task_struct lifetime model.

Evidence: file-notes/linux__rust__kernel__task.rs.md.

Staged Process Creation

Linux task creation is a staged pipeline: validate clone flags and namespace constraints, duplicate task state, copy or share resources, reserve pid and scheduler/cgroup state, make the task visible, then wake it. Error labels unwind initialized subsystems before visibility; after the no-failure point, publication proceeds through locks and post-fork hooks.

Evidence: file-notes/linux__kernel__fork.c.md.

Per-CPU Runqueues And Scheduler Classes

Linux scheduler state is centered on per-CPU struct rq instances. Each runqueue embeds fair, RT, and deadline queues, tracks hot current/runnable state, and is protected by explicit runqueue locking rules. Scheduling policy is factored through struct sched_class, a C operation table whose callbacks also document lock preconditions.

Evidence: file-notes/linux__kernel__sched__sched.h.md, file-notes/linux__kernel__sched__core.c.md.

Scheduler Lifecycle Pipeline

The core scheduler owns task lifecycle transitions. Fork-time setup keeps a task in TASK_NEW; cgroup fork setup attaches task-group and CPU state; wake_up_new_task() publishes the first runnable state; schedule() enters __schedule() under preemption discipline; and context_switch() performs memory-map and architecture register/stack handoff before post-switch cleanup.

Evidence: file-notes/linux__kernel__sched__core.c.md.

Fair Scheduling Uses EEVDF Accounting

The fair class tracks virtual runtime, lag, and virtual deadlines. A fair task is eligible when it is owed service, and among eligible tasks the earliest virtual deadline wins. Linux implements this with augmented RB-tree state plus runtime/deadline accounting on the current scheduling entity.

Evidence: file-notes/linux__kernel__sched__fair.c.md, file-notes/linux__Documentation__scheduler__sched-eevdf.rst.md.

RT And Deadline Classes Are Distinct Policy Engines

Real-time scheduling uses priority arrays, bitmaps, FIFO lists, and runtime throttling. Deadline scheduling uses EDF plus Constant Bandwidth Server, deadline-ordered RB trees, bandwidth counters, throttling, replenishment, and parameter validation.

Evidence: file-notes/linux__kernel__sched__rt.c.md, file-notes/linux__kernel__sched__deadline.c.md, file-notes/linux__Documentation__scheduler__sched-rt-group.rst.md, file-notes/linux__Documentation__scheduler__sched-deadline.rst.md.

Address Spaces, VMAs, And Fault Descriptors

Linux memory management separates process address-space state (mm_struct), virtual memory region state (vm_area_struct), and per-fault state (struct vm_fault). VMAs carry range, permissions, backing object, callbacks, anonymous state, and lock/refcount state. Fault results are bitmasks so the fault path can distinguish OOM, retry, SIGBUS/SIGSEGV, COW completion, and lock-release completion.

Evidence: file-notes/linux__include__linux__mm_types.h.md, file-notes/linux__include__linux__mm.h.md.

Page Faults Are Typed Control Flow

The page-fault path is a staged control flow: validate access, enter memcg fault context, walk/allocate page-table levels, attempt huge-page paths, fall back to PTE dispatch, then handle missing, swap, NUMA, write-protect, anonymous, file-backed, shared, and COW cases. PTE mutation is protected by page-table locks and race rechecks.

Evidence: file-notes/linux__mm__memory.c.md.

Slab Caches Are Validated Object Allocators

Linux uses named slab caches for repeated fixed-size kernel objects. Cache creation validates context, flags, usercopy ranges, merge policy, alignment, and aliases under slab_mutex. Cache destruction waits for deferred RCU/free activity and verifies object teardown before releasing cache metadata.

Evidence: file-notes/linux__mm__slab_common.c.md, file-notes/linux__Documentation__core-api__memory-allocation.rst.md.

AI Contribution Provenance

Linux process guidance treats AI assistance as provenance, not authorship certification. AI tools follow normal kernel development process, humans retain DCO responsibility, and AI assistance is attributed with agent/model/tool metadata.

Evidence: file-notes/linux__Documentation__process__coding-assistants.rst.md.

Cross-Cutting Patterns To Track

  • Table-driven subsystem registration.
  • Operation tables and late binding.
  • Reference-counted object lifetimes.
  • Per-CPU state and locality.
  • Locking hierarchy and RCU read-side behavior.
  • User/kernel boundary validation.
  • Capability and namespace checks.
  • Build-time feature selection.
  • Runtime observability hooks.

Evidence Index

  • file-notes/linux__rust__kernel__lib.rs.md
  • file-notes/linux__rust__kernel__sync__arc.rs.md
  • file-notes/linux__rust__kernel__task.rs.md
  • file-notes/linux__kernel__fork.c.md
  • file-notes/linux__kernel__sched__sched.h.md
  • file-notes/linux__kernel__sched__core.c.md
  • file-notes/linux__kernel__sched__fair.c.md
  • file-notes/linux__kernel__sched__rt.c.md
  • file-notes/linux__kernel__sched__deadline.c.md
  • file-notes/linux__Documentation__scheduler__sched-design-CFS.rst.md
  • file-notes/linux__Documentation__scheduler__sched-eevdf.rst.md
  • file-notes/linux__Documentation__scheduler__sched-rt-group.rst.md
  • file-notes/linux__Documentation__scheduler__sched-deadline.rst.md
  • file-notes/linux__include__linux__mm_types.h.md
  • file-notes/linux__include__linux__mm.h.md
  • file-notes/linux__mm__memory.c.md
  • file-notes/linux__mm__slab_common.c.md
  • file-notes/linux__Documentation__admin-guide__mm__concepts.rst.md
  • file-notes/linux__Documentation__core-api__memory-allocation.rst.md
  • file-notes/linux__Documentation__process__coding-assistants.rst.md