Linux Repo Atlas
Imported from
_research/manual-study-linux/linux-repo-atlas.md.
Linux Repo Atlas
Status: repo-wide atlas started. This is the whole-source-tree map that the subsystem dossiers must hang under. It covers the top-level Linux repository, what each major directory owns, where the C implementation entry points live, and how cross-directory execution paths actually move.
This is not a replacement for file-by-file subsystem analysis. It is the repo-wide skeleton: the thing that prevents the research from becoming a pile of isolated chapters.
Source Surface
Reviewed source root: repositories/reference/linux-study-clean.
Top-level source directories present in this checkout:
archblockcertscryptoDocumentationdriversfsincludeinitio_uringipckernellibLICENSESmmnetrustsamplesscriptssecuritysoundtoolsusrvirt
Top-level control files:
KconfigKbuildMakefileMAINTAINERSCOPYING.clang-format.rustfmt.toml.clippy.toml
What The Linux Repo Is Structurally
The Linux repository is not organized like a single application. It is closer to a platform source distribution with several overlapping layers:
- Architecture glue: CPU entry, exception handling, page tables, atomics, boot.
- Core kernel services: scheduling, fork/exit, timers, workqueues, locking, RCU, signals, cgroups, namespaces, modules, printk, tracing.
- Memory management: virtual memory, page tables, page cache, reclaim, slab, physical page allocation, mmap, swap.
- Object and resource membranes: VFS, block layer, net sockets, device model, security hooks.
- Protocol and hardware implementations: filesystems, network protocols, drivers, crypto algorithms, sound, virtualization.
- Build/config/documentation/tooling: Kconfig, Kbuild, scripts, docs, tools, samples, tests, BPF/perf helpers.
Most real execution paths cross at least four directories. A syscall does not
stay in kernel/; a read from disk can cross arch/, kernel/, fs/, mm/,
block/, drivers/, security/, and include/.
Directory-By-Directory Map
init/
Owns early kernel startup. The central C file is init/main.c, where
architecture setup hands off into generic kernel initialization. This directory
is where Linux transitions from bootloader/architecture-specific setup into the
generic kernel world.
Key responsibilities:
start_kernel()orchestration.- Early command-line parsing.
- Boot CPU and scheduler setup.
- Initcall execution.
- Kernel thread/user init handoff.
Representative code path:
asmlinkage __visible __init __no_sanitize_address __noreturnvoid start_kernel(void){ set_task_stack_end_magic(&init_task); smp_setup_processor_id(); debug_objects_early_init(); cgroup_init_early(); local_irq_disable(); boot_cpu_init(); page_address_init(); pr_notice("%s", linux_banner); setup_arch(&command_line); setup_boot_config(); setup_command_line(command_line); setup_nr_cpu_ids(); setup_per_cpu_areas(); smp_prepare_boot_cpu(); build_all_zonelists(NULL); page_alloc_init(); ...}This is the pattern of init/main.c: early global state is initialized before
normal allocation, scheduling, interrupts, devices, and userspace exist.
arch/
Owns architecture-specific implementations. Generic kernel code depends on
arch/ for CPU entry paths, syscall ABI, interrupt/trap handling, page-table
layout, atomics, cache/TLB operations, context switching, and boot setup.
Important subdirectories:
arch/x86: x86 boot, syscall, traps, page faults, KVM glue.arch/arm64: arm64 exception entry, MMU, boot, platform ABI.arch/riscv,arch/powerpc,arch/s390, etc.: equivalent platform ports.
Key cross-directory role:
arch/*/kernel/enters generickernel/code.arch/*/mm/calls genericmm/page fault handling.arch/*/include/asm/supplies low-level definitions included byinclude/linux/**.
include/
Owns public and internal kernel interfaces. This is where Linux encodes its cross-subsystem contracts: structs, operation tables, flags, inline helpers, architecture-independent APIs, and UAPI headers.
Important areas:
include/linux/: internal kernel APIs.include/uapi/: userspace ABI exposed through headers.include/asm-generic/: generic architecture fallback definitions.include/net/,include/trace/,include/crypto/,include/drm/, etc.
Representative VFS contract from include/linux/fs.h:
struct file { spinlock_t f_lock; fmode_t f_mode; const struct file_operations *f_op; struct address_space *f_mapping; void *private_data; struct inode *f_inode; unsigned int f_flags; const struct cred *f_cred; union { const struct path f_path; struct path __f_path; }; loff_t f_pos; file_ref_t f_ref;};The important idea: include/ is not just declarations. It is the kernel’s
type-level architecture.
kernel/
Owns central kernel policy and runtime services.
Important areas:
kernel/sched/: scheduler classes, runqueues, context switching.kernel/fork.c: process/thread creation.kernel/exit.c: process exit and reaping.kernel/pid.c: PID allocation and lookup.kernel/time/: timers, clocks, timekeeping.kernel/workqueue.c: deferred process-context work.kernel/locking/: mutexes, spinlocks, lock debugging.kernel/rcu/: RCU grace periods and callback execution.kernel/events/: perf events.kernel/trace/: tracing infrastructure.kernel/bpf/: BPF syscall surface and program/object management.kernel/cgroup/: cgroup hierarchy and resource domain plumbing.kernel/module/: module loading and unloading.
Representative process creation flow:
SYSCALL_DEFINE5(clone, unsigned long, clone_flags, unsigned long, newsp, int __user *, parent_tidptr, unsigned long, tls, int __user *, child_tidptr){ struct kernel_clone_args args = { .flags = (clone_flags & ~CSIGNAL), .pidfd = parent_tidptr, .child_tid = child_tidptr, .parent_tid = parent_tidptr, .exit_signal = (clone_flags & CSIGNAL), .stack = newsp, .tls = tls, };
return kernel_clone(&args);}The syscall wrapper builds a structured internal argument object. Deeper code then copies credentials, namespaces, files, signal state, scheduler state, memory mappings, and architecture thread state.
mm/
Owns memory management: process address spaces, physical pages, page cache, faults, reclaim, mmap, swap, slab, vmalloc, huge pages, and memory policy.
Important files already deep-reviewed:
mm/memory.c: core page fault and page-table handling.mm/mmap.c: VMA creation, lookup, unmap, fork duplication.mm/page_alloc.c: zoned buddy allocator.mm/vmscan.c: reclaim engine.mm/filemap.c: page cache and file-backed mmap faults.mm/slab_common.c: object cache lifecycle.
Representative fault dispatch:
if (!vmf->pte) return do_pte_missing(vmf);
if (!pte_present(vmf->orig_pte)) return do_swap_page(vmf);
if (pte_protnone(vmf->orig_pte) && vma_is_accessible(vmf->vma)) return do_numa_page(vmf);
if (vmf->flags & (FAULT_FLAG_WRITE|FAULT_FLAG_UNSHARE)) { if (!pte_write(entry)) return do_wp_page(vmf); else if (likely(vmf->flags & FAULT_FLAG_WRITE)) entry = pte_mkdirty(entry);}This shows why mm/ is both a policy layer and a concurrency layer: every
branch must preserve page-table consistency, VMA lifetime, reverse mappings,
memcg accounting, and fault retry semantics.
fs/
Owns the virtual filesystem layer plus individual filesystem implementations.
Important generic files:
fs/open.c: open/openat/openat2, close, file activation.fs/read_write.c: read/write/readv/writev/copy paths.fs/namei.c: pathname resolution, dcache lookup, create/unlink/rename.fs/inode.c: inode allocation, cache lookup, lifecycle, eviction.fs/super.c: superblock allocation, mount, shutdown, freeze/thaw.fs/file_table.c:struct fileallocation and lifetime.fs/dcache.c: dentry cache.
Representative read dispatch:
if (!(file->f_mode & FMODE_READ)) return -EBADF;if (!(file->f_mode & FMODE_CAN_READ)) return -EINVAL;if (unlikely(!access_ok(buf, count))) return -EFAULT;
ret = rw_verify_area(READ, file, pos, count);if (ret) return ret;
if (file->f_op->read) ret = file->f_op->read(file, buf, count, pos);else if (file->f_op->read_iter) ret = new_sync_read(file, buf, count, pos);else ret = -EINVAL;The VFS pattern is capability check, user pointer check, LSM/fsnotify check, then operation-table dispatch.
block/
Owns generic block I/O: bios, requests, queues, multiqueue dispatch, request completion, partitions, elevator/scheduler hooks, and block-device helpers.
Important areas:
block/blk-core.c: request submission and queue lifecycle.block/blk-mq.c: multiqueue tag allocation and dispatch.block/bio.c: bio allocation, splitting, cloning.include/linux/blkdev.h: block device public contracts.
Cross-directory path:
fs/ creates buffered or direct IO, mm/ supplies page-cache folios or direct
user pages, block/ converts IO into bios/requests, and drivers/ submit the
work to hardware.
drivers/
Owns hardware and pseudo-device implementations. It is the largest tree because Linux supports many buses, devices, and platforms.
Important common areas:
drivers/base/: driver core, device model, buses, classes, sysfs device lifecycle, probe/remove.drivers/pci/: PCI enumeration and driver binding.drivers/platform/: platform devices/drivers.drivers/net/: NIC drivers.drivers/gpu/: DRM/GPU drivers.drivers/block/,drivers/nvme/,drivers/scsi/,drivers/ata/: storage.drivers/usb/,drivers/hid/,drivers/input/: external/user IO.drivers/irqchip/,drivers/clocksource/,drivers/clk/: platform infrastructure.
Representative driver-core idea:
struct device` is the runtime object.struct device_driver` is the implementation.struct bus_type` supplies match/probe/remove policy.The driver model is not merely “call probe.” It creates object lifetime, reference ownership, sysfs representation, power-management hooks, DMA/IOMMU context, firmware links, and deferred probing.
net/
Owns networking core, protocol stacks, socket integration, packet buffers, qdiscs, routing, netfilter, and device ingress/egress glue.
Important areas:
net/socket.c: socket syscalls and file/socket conversion.net/core/:sk_buff, network devices, packet RX/TX core.net/ipv4/,net/ipv6/: IP/TCP/UDP stacks.net/netfilter/: firewall/NAT hooks.net/sched/: traffic control.include/linux/skbuff.h,include/linux/netdevice.h,include/linux/net.h.
Representative socket pattern:
sock = sockfd_lookup_light(fd, &err, &fput_needed);if (sock) { err = sock->ops->sendmsg(sock, msg, size); fput_light(sock->file, fput_needed);}Network syscalls validate file descriptors, recover the socket object, then dispatch through protocol operation tables.
security/
Owns Linux Security Module dispatch and security subsystem integration.
Important areas:
security/security.c: LSM hook dispatch.security/selinux/,security/apparmor/,security/smack/: major LSMs.security/keys/: key management.security/integrity/: IMA/EVM integrity policy.include/linux/lsm_hook_defs.h: hook catalog.
Security is not one check at the syscall boundary. Hooks are distributed across VFS, networking, process creation, credentials, IPC, module loading, BPF, and mount paths.
crypto/
Owns cryptographic algorithm registration and implementations.
Important areas:
- algorithm implementations such as hashes, ciphers, AEAD.
- crypto API registration and template composition.
- asymmetric keys and certificate validation support.
Used by:
- IPsec and networking.
- dm-crypt and storage.
- filesystem encryption.
- module signature verification.
- integrity/security subsystems.
ipc/
Owns System V IPC and POSIX-ish IPC kernel implementations: message queues, semaphores, shared memory, namespaces, permissions, and lifecycle.
Cross-directory connections:
kernel/nsproxy.cand namespace code provide IPC namespaces.security/gates IPC operations with LSM hooks.mm/backs shared memory.
io_uring/
Owns the modern async IO interface. It crosses VFS, networking, block, memory, task_work, polling, fixed buffers, registered files, and completion queues.
Important concept:
io_uring is not only async read/write. It is a submission/completion runtime
inside the kernel with registered resources and operation-specific dispatch.
lib/
Owns common kernel library code: data structures, string helpers, checksums, parsing helpers, radix/xarray support, bitmap helpers, math, compression, and test modules.
This is where many subsystems get shared primitives instead of depending on a standard C library.
rust/
Owns the in-kernel Rust support layer. It is an API membrane around selected kernel concepts: allocation, error handling, ownership wrappers, sync, task/device abstractions, and module integration.
Important files already partly reviewed:
rust/kernel/lib.rsrust/kernel/sync/arc.rsrust/kernel/task.rs
Rust in Linux is not a second kernel. It is a safe wrapper layer over selected C kernel APIs, with explicit unsafe boundaries.
scripts/
Owns build and developer tooling:
- Kconfig parser/config tools.
- Kbuild helpers.
- modpost/module tools.
- checkpatch and static analysis helpers.
- documentation generation helpers.
- syscall/header generation.
This directory explains why Linux is not just C files plus a Makefile; it is a configuration and code-generation system.
tools/
Owns userspace tools shipped with the kernel tree: perf, BPF tools,
testing helpers, tracing tools, objtool, bootconfig tools, and more.
These are not linked into the kernel image, but they are part of the repo because they validate, inspect, or interact with kernel features.
Documentation/
Owns source-adjacent design, API, maintainer, user, and subsystem documentation. Some docs are user-facing, some are maintainer-facing, and some are implementation contracts for kernel developers.
Important areas:
Documentation/core-apiDocumentation/filesystemsDocumentation/networkingDocumentation/driver-apiDocumentation/schedulerDocumentation/mmDocumentation/lockingDocumentation/RCUDocumentation/bpfDocumentation/kbuildDocumentation/rust
sound/
Owns ALSA and audio device support. It includes core sound infrastructure, PCI/USB/platform audio drivers, codec support, sequencing, and userspace ABI glue.
virt/
Owns generic virtualization support beyond architecture-specific KVM code.
Architecture-specific virtualization lives under arch/*/kvm; generic pieces
and helpers live here.
certs/, usr/, samples/, LICENSES/
certs/: certificate material and build integration for trusted keyrings and module/signature validation.usr/: initramfs/user image build support.samples/: example kernel code and BPF/sample integrations.LICENSES/: SPDX license texts and exceptions used by source annotations.
Cross-Directory Execution Paths
Boot To First Userspace
arch/*assembly/C entry establishes CPU mode, page tables, and calls generic boot.init/main.c:start_kernel()initializes generic kernel subsystems.mm/initializes physical page allocation and VM.kernel/sched/initializes scheduling.kernel/time/,kernel/rcu/,kernel/workqueue.c,kernel/softirq.cinitialize deferred work and time.drivers/and bus subsystems initialize through initcalls.fs/mounts root.init/starts the first userspace process.
open() Path
arch/*/entrydispatches syscall.fs/open.creceivesopen,openat, oropenat2.build_open_flags()converts raw flags intostruct open_flags.fs/namei.cresolves the pathname with RCU lookup first, then refcounted fallback if necessary.security/LSM hooks check path, inode, and file permissions.fs/open.c:do_dentry_open()activatesstruct file.- Filesystem-specific
inode_operationsandfile_operationscomplete the object-specific behavior. - The file is installed into the process descriptor table.
read() From A Regular File
fs/read_write.c:ksys_read()resolves the fd tostruct file.vfs_read()checks mode, user buffer, range, LSM, and fsnotify.- It dispatches to
file->f_op->readorread_iter. - Generic filesystem code often enters page cache in
mm/filemap.c. - Page-cache misses allocate folios in
mm/page_alloc.c. - Filesystem read code submits IO through
block/. - Block layer dispatches requests to
drivers/. - Completion wakes the original task and updates file position/accounting.
Page Fault On File-Backed Mapping
arch/*/mm/fault.chandles CPU fault and locates VMA.mm/memory.c:handle_mm_fault()validates and dispatches.- Missing PTE routes to file VMA
vm_ops->fault. - Regular files use
mm/filemap.c:filemap_fault(). - Page cache lookup may allocate and read a folio.
- Filesystem/block/driver path may perform IO.
mm/memory.cinstalls the PTE after revalidating races.
Packet Receive
- NIC driver under
drivers/net/receives interrupt/NAPI work. - Packet buffer is represented as
struct sk_buff. net/core/dev.cruns ingress processing.- Protocol stack in
net/ipv4ornet/ipv6parses headers and finds socket. security/and netfilter hooks may inspect/deny/transform.- Socket receive queue wakes user process waiting in
net/socket.csyscall path.
Module Load
- Userspace calls finit_module/init_module syscall.
kernel/module/parses module ELF and metadata.security/checks module loading policy and signature/integrity.certs/keyrings may validate signatures.kernel/module/resolves symbols, applies relocations, and runs init.- Driver modules register with
drivers/base, bus-specific code, filesystems, networking, or other subsystem registries.
What Full-Repo Completion Requires
A true full Linux repo dossier needs two layers at the same time:
- A complete atlas of the whole tree, so every major directory and execution path has a place.
- Deep file-by-file subsystem chapters, where each selected source file is explained with concrete C snippets, control flow, state mutation, locking, lifetime, error paths, and cross-subsystem calls.
The current verified deep coverage is not whole-repo complete. It currently covers scheduler and memory best. The atlas is the repo-wide spine; the remaining work is to attach equally deep dossiers for VFS, block, networking, drivers, sync, timers/IRQ/workqueues, observability, security/isolation, build, Rust, architecture, boot/init, crypto, IPC, io_uring, tools, and sound.
Source Notes
progress-ledger.csvcoverage-map.mdarchitecture-map.mdscheduler-process.mdmemory-management.mdvfs-filesystems.mdnetworking-sockets.mddrivers-device-model.mdsync-rcu-locking.mdtimers-workqueues.mdobservability-bpf-trace.mdsecurity-isolation.mdbuild-kconfig-modules.mdrust-kernel-layer.mdblock-storage.mdarchitecture-layer.md