linux/mm/page-alloc.c
Imported from
_research/manual-study-linux/file-notes/linux__mm__page_alloc.c.md.
File Notes: mm/page_alloc.c
Status: reviewed.
Purpose
Implements Linux’s zoned buddy page allocator: physical page free-list management, order-based splitting and coalescing, GFP-to-allocation policy, watermark/reserve handling, direct reclaim/compaction retries, OOM fallback, and public page allocation/free APIs.
Key Types And Functions
__free_one_page(): returns a page block to the buddy allocator and merges with free buddies.__rmqueue_smallest(): removes the smallest fitting free block and splits it as needed.gfp_to_alloc_flags(): converts GFP mask intent into allocator control flags.should_reclaim_retry(): decides whether direct reclaim should retry.__alloc_pages_slowpath(): handles reclaim, compaction, reserves, OOM, and no-fail allocation behavior after the fast path misses.__alloc_frozen_pages_noprof()and__alloc_pages_noprof(): primary page allocation entry points.__free_pages(): public page free path.
Data Flow
The allocator maintains per-zone free lists by order. Freeing enters
__free_one_page(), accounts the free range, repeatedly checks whether the
matching buddy is also free, removes mergeable buddies from their list, combines
the range into the next higher order, and finally inserts the combined block
back into the correct free list.
Allocation starts in __alloc_frozen_pages_noprof(). It validates the order,
masks the GFP flags, prepares the allocation context, applies fragmentation
avoidance, then tries get_page_from_freelist(). If the fast path cannot meet
watermarks, it enters __alloc_pages_slowpath(), wakes kswapd, retries the
freelist under adjusted flags, decides whether reserves are allowed, performs
direct reclaim and compaction when permitted, evaluates retry rules, invokes OOM
handling if necessary, and implements __GFP_NOFAIL looping.
Invariants And Safety Contracts
- Allocation order must be less than
MAX_PAGE_ORDER. - Free-side merging only proceeds while the buddy is free and compatible.
- Reserve access is derived from GFP policy, task context, cpusets, and memory pressure state rather than directly exposed to callers.
- Slowpath allocation must avoid unbounded retries except for explicit no-fail requests.
- Public free requires a valid
struct page *and known order.
Rust Translation Guidance
Represent physical allocation as typed zones with order-indexed free lists and
an AllocPolicy derived from caller intent. A Rust version should keep reserve
access, reclaim permission, compaction permission, and no-fail behavior as
explicit policy fields. Free-page merging should be isolated behind a zone lock
guard and return a structured merge result for diagnostics.
AI-Native Systems Guidance
The page allocator maps cleanly to AI runtimes with tiered memory pools. Fast paths should allocate from local free lists, slow paths should wake background reclaim, compact fragmented state, retry only when progress is plausible, and escalate to typed pressure/OOM outcomes instead of hanging indefinitely.
Evidence
- The buddy allocator algorithm is described in comments at
mm/page_alloc.c:913-934. __free_one_page()begins atmm/page_alloc.c:936and performs accounting plus merge-loop coalescing atmm/page_alloc.c:954-1005.- The merged block is placed back onto the free list at
mm/page_alloc.c:1007-1019. __rmqueue_smallest()searches order free lists and splits larger blocks atmm/page_alloc.c:1884-1912.gfp_to_alloc_flags()translates GFP masks into allocator flags atmm/page_alloc.c:4476-4526.- Reserve permission helpers are at
mm/page_alloc.c:4543-4563. should_reclaim_retry()evaluates watermark progress and retry limits atmm/page_alloc.c:4570-4655.__alloc_pages_slowpath()is the reclaim/compaction/OOM retry path atmm/page_alloc.c:4724-5023.- The main zoned buddy allocator entry
__alloc_frozen_pages_noprof()is documented as the allocator heart atmm/page_alloc.c:5265-5267and runs atmm/page_alloc.c:5268-5331. __alloc_pages_noprof()wraps allocation with page refcount setup atmm/page_alloc.c:5333-5343.- Public
__free_pages()is documented and implemented atmm/page_alloc.c:5401-5425.