Documentation/networking/msg_zerocopy.rst
Source file repositories/reference/linux-study-clean/Documentation/networking/msg_zerocopy.rst
File Facts
- System
- Linux kernel
- Corpus path
Documentation/networking/msg_zerocopy.rst- Extension
.rst- Size
- 9078 bytes
- Lines
- 266
- Domain
- Support Tooling And Documentation
- Bucket
- Documentation
- Inferred role
- Support Tooling And Documentation: documentation
- Status
- atlas-only
Why This File Exists
Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
- Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
- Defines or uses C structs; map object ownership, embedded links, reference counts, and lock ownership.
Dependency Surface
- No C-style include directives detected by the generator.
Detected Declarations
- No top-level syscall, struct, function, initcall, or export declaration detected by the generator.
Annotated Snippet
============
MSG_ZEROCOPY
============
Intro
=====
The MSG_ZEROCOPY flag enables copy avoidance for socket send calls.
The feature is currently implemented for TCP, UDP and VSOCK (with
virtio transport) sockets.
Opportunity and Caveats
-----------------------
Copying large buffers between user process and kernel can be
expensive. Linux supports various interfaces that eschew copying,
such as sendfile and splice. The MSG_ZEROCOPY flag extends the
underlying copy avoidance mechanism to common socket send calls.
Copy avoidance is not a free lunch. As implemented, with page pinning,
it replaces per byte copy cost with page accounting and completion
notification overhead. As a result, MSG_ZEROCOPY is generally only
effective at writes over around 10 KB.
Page pinning also changes system call semantics. It temporarily shares
the buffer between process and network stack. Unlike with copying, the
process cannot immediately overwrite the buffer after system call
return without possibly modifying the data in flight. Kernel integrity
is not affected, but a buggy program can possibly corrupt its own data
stream.
The kernel returns a notification when it is safe to modify data.
Converting an existing application to MSG_ZEROCOPY is not always as
trivial as just passing the flag, then.
More Info
---------
Much of this document was derived from a longer paper presented at
netdev 2.1. For more in-depth information see that paper and talk,
the excellent reporting over at LWN.net or read the original code.
paper, slides, video
https://netdevconf.org/2.1/session.html?debruijn
LWN article
https://lwn.net/Articles/726917/
patchset
[PATCH net-next v4 0/9] socket sendmsg MSG_ZEROCOPY
https://lore.kernel.org/netdev/20170803202945.70750-1-willemdebruijn.kernel@gmail.com
Interface
=========
Passing the MSG_ZEROCOPY flag is the most obvious step to enable copy
avoidance, but not the only one.
Socket Setup
------------
The kernel is permissive when applications pass undefined flags to the
send system call. By default it simply ignores these. To avoid enabling
copy avoidance mode for legacy processes that accidentally already pass
this flag, a process must first signal intent by setting a socket option:
::
Annotation
- Atlas domain: Support Tooling And Documentation / Documentation.
- Implementation status: atlas-only.
Implementation Notes
- This generated page is the file-by-file coverage layer; curated subsystem chapters should link here when they synthesize a multi-file control flow.
- Core OS pages should be promoted from atlas-only to deep-reviewed when they explain data structures, invariants, locking, lifecycle, and C implementation snippets.
- Driver-family pages are intentionally pattern-oriented unless they are part of the selected PCIe/NVMe representative device path.