Skip to content

linux/mm/page-alloc.c

Imported from _research/manual-study-linux/file-notes/linux__mm__page_alloc.c.md.

File Notes: mm/page_alloc.c

Status: reviewed.

Purpose

Implements Linux’s zoned buddy page allocator: physical page free-list management, order-based splitting and coalescing, GFP-to-allocation policy, watermark/reserve handling, direct reclaim/compaction retries, OOM fallback, and public page allocation/free APIs.

Key Types And Functions

  • __free_one_page(): returns a page block to the buddy allocator and merges with free buddies.
  • __rmqueue_smallest(): removes the smallest fitting free block and splits it as needed.
  • gfp_to_alloc_flags(): converts GFP mask intent into allocator control flags.
  • should_reclaim_retry(): decides whether direct reclaim should retry.
  • __alloc_pages_slowpath(): handles reclaim, compaction, reserves, OOM, and no-fail allocation behavior after the fast path misses.
  • __alloc_frozen_pages_noprof() and __alloc_pages_noprof(): primary page allocation entry points.
  • __free_pages(): public page free path.

Data Flow

The allocator maintains per-zone free lists by order. Freeing enters __free_one_page(), accounts the free range, repeatedly checks whether the matching buddy is also free, removes mergeable buddies from their list, combines the range into the next higher order, and finally inserts the combined block back into the correct free list.

Allocation starts in __alloc_frozen_pages_noprof(). It validates the order, masks the GFP flags, prepares the allocation context, applies fragmentation avoidance, then tries get_page_from_freelist(). If the fast path cannot meet watermarks, it enters __alloc_pages_slowpath(), wakes kswapd, retries the freelist under adjusted flags, decides whether reserves are allowed, performs direct reclaim and compaction when permitted, evaluates retry rules, invokes OOM handling if necessary, and implements __GFP_NOFAIL looping.

Invariants And Safety Contracts

  • Allocation order must be less than MAX_PAGE_ORDER.
  • Free-side merging only proceeds while the buddy is free and compatible.
  • Reserve access is derived from GFP policy, task context, cpusets, and memory pressure state rather than directly exposed to callers.
  • Slowpath allocation must avoid unbounded retries except for explicit no-fail requests.
  • Public free requires a valid struct page * and known order.

Rust Translation Guidance

Represent physical allocation as typed zones with order-indexed free lists and an AllocPolicy derived from caller intent. A Rust version should keep reserve access, reclaim permission, compaction permission, and no-fail behavior as explicit policy fields. Free-page merging should be isolated behind a zone lock guard and return a structured merge result for diagnostics.

AI-Native Systems Guidance

The page allocator maps cleanly to AI runtimes with tiered memory pools. Fast paths should allocate from local free lists, slow paths should wake background reclaim, compact fragmented state, retry only when progress is plausible, and escalate to typed pressure/OOM outcomes instead of hanging indefinitely.

Evidence

  • The buddy allocator algorithm is described in comments at mm/page_alloc.c:913-934.
  • __free_one_page() begins at mm/page_alloc.c:936 and performs accounting plus merge-loop coalescing at mm/page_alloc.c:954-1005.
  • The merged block is placed back onto the free list at mm/page_alloc.c:1007-1019.
  • __rmqueue_smallest() searches order free lists and splits larger blocks at mm/page_alloc.c:1884-1912.
  • gfp_to_alloc_flags() translates GFP masks into allocator flags at mm/page_alloc.c:4476-4526.
  • Reserve permission helpers are at mm/page_alloc.c:4543-4563.
  • should_reclaim_retry() evaluates watermark progress and retry limits at mm/page_alloc.c:4570-4655.
  • __alloc_pages_slowpath() is the reclaim/compaction/OOM retry path at mm/page_alloc.c:4724-5023.
  • The main zoned buddy allocator entry __alloc_frozen_pages_noprof() is documented as the allocator heart at mm/page_alloc.c:5265-5267 and runs at mm/page_alloc.c:5268-5331.
  • __alloc_pages_noprof() wraps allocation with page refcount setup at mm/page_alloc.c:5333-5343.
  • Public __free_pages() is documented and implemented at mm/page_alloc.c:5401-5425.