Skip to content

AI-Native Systems

Imported from _research/manual-study-linux/ai-native-systems.md.

AI-Native Systems Notes

This document captures design ideas for AI-aware systems inspired by Linux implementation patterns. Entries must say whether they are source-backed, interpretive, or speculative.

Initial Design Themes

  • Agent-visible runtime state should be structured like kernel observability, not scraped from logs only.
  • Agent operations need policy hooks similar in spirit to LSM and capability checks.
  • Long-running AI work needs scheduler/resource boundaries similar to cgroups and namespaces.
  • Code-editing agents need auditable operation tables: what action was requested, what authority allowed it, what state changed, and what rollback path exists.
  • AI-readable telemetry should be planned as a subsystem, not bolted onto UI logs.

Evidence Levels

  • confirmed: directly supported by reviewed Linux source or documentation.
  • interpretive: design conclusion based on reviewed source.
  • speculative: useful idea not yet source-backed.

Pending Track Notes

API Membrane For Agents

Evidence level: interpretive.

The Linux Rust crate boundary implies that AI systems should expose stable, audited capability wrappers to agents rather than raw system operations. Agents should request a capability through a typed wrapper and policy layer, not reach directly into unrestricted runtime internals.

Evidence: file-notes/linux__rust__kernel__lib.rs.md.

Ownership-Aware Agent Handles

Evidence level: interpretive.

Agent resource handles should encode whether the agent has a borrowed view, a unique editable object, or a shared published object. This mirrors the ArcBorrow / UniqueArc / Arc separation and gives safer semantics for edits, rollback, and publication.

Evidence: file-notes/linux__rust__kernel__sync__arc.rs.md.

Scoped Authority Versus Durable Authority

Evidence level: interpretive.

Ambient authority should be scoped like CurrentTask; durable authority should require explicit conversion into a refcounted/audited handle. This maps to agent leases, session permissions, and detached background jobs.

Evidence: file-notes/linux__rust__kernel__task.rs.md.

Agent Job Creation Pipeline

Evidence level: interpretive.

Agent jobs should follow a process-creation pipeline: validate policy and namespace/resource constraints, allocate job state, install observability and cleanup guards, publish the job, then start execution. Before publication, failures unwind; after publication, cancellation uses runtime protocols.

Evidence: file-notes/linux__kernel__fork.c.md.

Agent Runqueues And Policy Classes

Evidence level: interpretive.

Agent work should be scheduled through explicit runqueues and policy classes, not only through ad hoc async task queues. A policy class can define enqueue, dequeue, pick, preempt, account, and completion hooks. That makes scheduling decisions inspectable and enforceable.

Evidence: file-notes/linux__kernel__sched__sched.h.md, file-notes/linux__kernel__sched__core.c.md.

Fair Agent Scheduling With Lag And Deadlines

Evidence level: interpretive.

EEVDF suggests a practical AI runtime policy: track service received by each agent job, identify jobs owed service by lag, and let latency-sensitive jobs request shorter slices through virtual deadlines. This avoids pure FIFO queues and avoids giving interactive jobs unlimited priority.

Evidence: file-notes/linux__kernel__sched__fair.c.md, file-notes/linux__Documentation__scheduler__sched-eevdf.rst.md.

Bounded Privileged Agents

Evidence level: interpretive.

Real-time Linux scheduling shows why privileged classes still need runtime budgets. High-priority agent jobs should have explicit period/runtime controls and leave recovery capacity for control-plane work.

Evidence: file-notes/linux__kernel__sched__rt.c.md, file-notes/linux__Documentation__scheduler__sched-rt-group.rst.md.

Deadline-Based Agent SLAs

Evidence level: interpretive.

Deadline scheduling maps to AI jobs with service-level objectives: runtime, deadline, period, admission control, throttling on overrun, and replenishment. This is a better model for bounded inference or automation windows than a single global priority queue.

Evidence: file-notes/linux__kernel__sched__deadline.c.md, file-notes/linux__Documentation__scheduler__sched-deadline.rst.md.

Lazy Agent State Faults

Evidence level: interpretive.

Linux page faults show how a runtime can materialize state only when accessed. Agent runtimes can apply the same model to long contexts, workspace snapshots, retrieval chunks, and generated artifacts: missing state can become zero-fill, backing-store fetch, copy-on-write clone, retry, or typed failure.

Evidence: file-notes/linux__mm__memory.c.md, file-notes/linux__include__linux__mm.h.md.

Region Permissions And Drop-Lock Outcomes

Evidence level: interpretive.

Agent-accessible state should be divided into regions with permissions, backing store, callbacks, and fault results. If a fault operation can drop a lock or retry, stale region handles must be invalidated just like Linux warns against dereferencing a VMA after mmap_lock may have been dropped.

Evidence: file-notes/linux__include__linux__mm_types.h.md, file-notes/linux__mm__memory.c.md.

Allocator Classes For Agent Runtimes

Evidence level: interpretive.

Repeated AI runtime objects should use allocator classes: fixed-size prompt segments, trace spans, tool-call records, embedding chunks, and job descriptors can be cached with accounting and export/usercopy policy attached.

Evidence: file-notes/linux__mm__slab_common.c.md, file-notes/linux__Documentation__core-api__memory-allocation.rst.md.

Provenance As System Metadata

Evidence level: confirmed process guidance plus interpretation.

Agent identity, model version, tool use, review status, and human acceptance should be first-class fields on changes. Linux’s process guidance separates AI assistance attribution from human DCO certification.

Evidence: file-notes/linux__Documentation__process__coding-assistants.rst.md.