Skip to content

linux/mm/memory.c

Imported from _research/manual-study-linux/file-notes/linux__mm__memory.c.md.

File Notes: mm/memory.c

Status: reviewed.

Purpose

Implements core virtual-memory fault handling: page-table walks, anonymous faults, file-backed faults, copy-on-write, write-protect handling, PTE install, huge-page fallback, memcg fault entry/exit, and fault accounting.

Key Types And Functions

  • handle_mm_fault(): public MM fault entry after VMA lookup/locking.
  • __handle_mm_fault(): constructs struct vm_fault and walks page tables.
  • handle_pte_fault(): PTE-level dispatcher.
  • do_pte_missing(): anonymous-vs-file missing PTE split.
  • do_anonymous_page(): zero-page and private anonymous allocation path.
  • __do_fault(), do_read_fault(), do_cow_fault(), do_shared_fault(), do_fault(): VMA operation based file/special mapping faults.
  • do_wp_page() and wp_page_copy(): write-protect and COW path.
  • finish_fault(): installs prepared page into page tables.

Data Flow

Architectural fault code finds a VMA and enters handle_mm_fault() with either the VMA lock or mmap_lock already held. The function sanitizes flags, checks architecture permissions, enters memcg user-fault handling for user faults, then routes to hugetlb handling or __handle_mm_fault().

__handle_mm_fault() builds a struct vm_fault, walks PGD/P4D/PUD/PMD levels, allocates missing page-table levels, tries huge-page paths where allowed, and falls back to PTE-level handling.

handle_pte_fault() reads the original PTE, dispatches missing entries to do_pte_missing(), swap entries to do_swap_page(), NUMA entries to do_numa_page(), write-protect faults to do_wp_page(), or marks an existing mapping accessed/dirty.

Anonymous faults map the zero page for read faults when allowed. Write faults prepare anon VMA state, allocate a folio, mark it uptodate before publishing the PTE, lock the page table, recheck races, and install the entry. File faults call vm_ops->fault() and then finish_fault() to install the returned page. COW faults allocate/copy first, then update the PTE only if the original PTE is still valid.

Invariants And Safety Contracts

  • Fault handlers may drop mmap_lock; after __handle_mm_fault() returns, the caller must not dereference vma.
  • PTE updates recheck original entries under the page-table lock before committing.
  • Anonymous VMA preparation may need to retry under mmap_lock if only the per-VMA lock is held.
  • finish_fault() expects a locked page and consumes the mapping reference on success.
  • File-backed fault handling preallocates PTE pages before taking folio locks to avoid reclaim/writeback deadlocks.

Rust Translation Guidance

Represent fault handling as a typed state machine: VMA locked, page-table level walked, PTE locked, folio locked, PTE installed. Use a FaultResult bitflag type and make retry/drop-lock outcomes explicit. Unsafe page-table writes should be narrowly isolated behind guards that prove the relevant VMA and PTE locks are held.

AI-Native Systems Guidance

AI runtimes can use the page-fault model for lazy state: missing state can map to synthesize-zero, fetch backing data, clone-on-write, retry, or fail with a typed error. Runtime code should not keep stale region handles after an operation can drop the region lock.

Evidence

  • PTE race handling and pte_unmap_same() are at mm/memory.c:3526-3547.
  • COW copy high-level flow is documented at mm/memory.c:3836-3852.
  • do_wp_page() separates shared writable mappings from private COW/reuse at mm/memory.c:4240-4315.
  • do_pte_missing() dispatches anonymous versus file-backed faults at mm/memory.c:4561-4567.
  • do_anonymous_page() handles zero-page mapping and private folio allocation at mm/memory.c:5282-5365.
  • __do_fault() preallocates PTE pages before folio locking and calls vm_ops->fault() at mm/memory.c:5393-5428.
  • finish_fault() installs prepared pages and handles rmap, memcg, and LRU integration at mm/memory.c:5603-5617.
  • Read, COW, and shared file-fault paths are at mm/memory.c:5840-5954.
  • handle_pte_fault() dispatches missing/swap/NUMA/write faults and updates access flags at mm/memory.c:6328-6408.
  • __handle_mm_fault() constructs struct vm_fault and walks page-table levels at mm/memory.c:6411-6515.
  • handle_mm_fault() performs top-level permission, memcg, hugetlb, and accounting flow at mm/memory.c:6644-6716.