Documentation/networking/multi-pf-netdev.rst
Source file repositories/reference/linux-study-clean/Documentation/networking/multi-pf-netdev.rst
File Facts
- System
- Linux kernel
- Corpus path
Documentation/networking/multi-pf-netdev.rst- Extension
.rst- Size
- 7399 bytes
- Lines
- 175
- Domain
- Support Tooling And Documentation
- Bucket
- Documentation
- Inferred role
- Support Tooling And Documentation: documentation
- Status
- atlas-only
Why This File Exists
Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
- Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
Dependency Surface
- No C-style include directives detected by the generator.
Detected Declarations
- No top-level syscall, struct, function, initcall, or export declaration detected by the generator.
Annotated Snippet
.. SPDX-License-Identifier: GPL-2.0
.. include:: <isonum.txt>
===============
Multi-PF Netdev
===============
Contents
========
- `Background`_
- `Overview`_
- `mlx5 implementation`_
- `Channels distribution`_
- `Observability`_
- `Steering`_
- `Mutually exclusive features`_
Background
==========
The Multi-PF NIC technology enables several CPUs within a multi-socket server to connect directly to
the network, each through its own dedicated PCIe interface. Through either a connection harness that
splits the PCIe lanes between two cards or by bifurcating a PCIe slot for a single card. This
results in eliminating the network traffic traversing over the internal bus between the sockets,
significantly reducing overhead and latency, in addition to reducing CPU utilization and increasing
network throughput.
Overview
========
The feature adds support for combining multiple PFs of the same port in a Multi-PF environment under
one netdev instance. It is implemented in the netdev layer. Lower-layer instances like pci func,
sysfs entry, and devlink are kept separate.
Passing traffic through different devices belonging to different NUMA sockets saves cross-NUMA
traffic and allows apps running on the same netdev from different NUMAs to still feel a sense of
proximity to the device and achieve improved performance.
mlx5 implementation
===================
Multi-PF or Socket-direct in mlx5 is achieved by grouping PFs together which belong to the same
NIC and has the socket-direct property enabled, once all PFs are probed, we create a single netdev
to represent all of them, symmetrically, we destroy the netdev whenever any of the PFs is removed.
The netdev network channels are distributed between all devices, a proper configuration would utilize
the correct close NUMA node when working on a certain app/CPU.
We pick one PF to be a primary (leader), and it fills a special role. The other devices
(secondaries) are disconnected from the network at the chip level (set to silent mode). In silent
mode, no south <-> north traffic flowing directly through a secondary PF. It needs the assistance of
the leader PF (east <-> west traffic) to function. All Rx/Tx traffic is steered through the primary
to/from the secondaries.
Currently, we limit the support to PFs only, and up to two PFs (sockets).
Channels distribution
=====================
We distribute the channels between the different PFs to achieve local NUMA node performance
on multiple NUMA nodes.
Each combined channel works against one specific PF, creating all its datapath queues against it. We
distribute channels to PFs in a round-robin policy.
::
Example for 2 PFs and 5 channels:
+--------+--------+
| ch idx | PF idx |
Annotation
- Atlas domain: Support Tooling And Documentation / Documentation.
- Implementation status: atlas-only.
Implementation Notes
- This generated page is the file-by-file coverage layer; curated subsystem chapters should link here when they synthesize a multi-file control flow.
- Core OS pages should be promoted from atlas-only to deep-reviewed when they explain data structures, invariants, locking, lifecycle, and C implementation snippets.
- Driver-family pages are intentionally pattern-oriented unless they are part of the selected PCIe/NVMe representative device path.