Documentation/gpu/drm-ras.rst
Source file repositories/reference/linux-study-clean/Documentation/gpu/drm-ras.rst
File Facts
- System
- Linux kernel
- Corpus path
Documentation/gpu/drm-ras.rst- Extension
.rst- Size
- 4012 bytes
- Lines
- 114
- Domain
- Support Tooling And Documentation
- Bucket
- Documentation
- Inferred role
- Support Tooling And Documentation: documentation
- Status
- atlas-only
Why This File Exists
Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
- Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.
Dependency Surface
- No C-style include directives detected by the generator.
Detected Declarations
- No top-level syscall, struct, function, initcall, or export declaration detected by the generator.
Annotated Snippet
.. SPDX-License-Identifier: GPL-2.0+
============================
DRM RAS over Generic Netlink
============================
The DRM RAS (Reliability, Availability, Serviceability) interface provides a
standardized way for GPU/accelerator drivers to expose error counters and
other reliability nodes to user space via Generic Netlink. This allows
diagnostic tools, monitoring daemons, or test infrastructure to query hardware
health in a uniform way across different DRM drivers.
Key Goals:
* Provide a standardized RAS solution for GPU and accelerator drivers, enabling
data center monitoring and reliability operations.
* Implement a single drm-ras Generic Netlink family to meet modern Netlink YAML
specifications and centralize all RAS-related communication in one namespace.
* Support a basic error counter interface, addressing the immediate, essential
monitoring needs.
* Offer a flexible, future-proof interface that can be extended to support
additional types of RAS data in the future.
* Allow multiple nodes per driver, enabling drivers to register separate
nodes for different IP blocks, sub-blocks, or other logical subdivisions
as applicable.
.. contents::
Nodes
=====
Nodes are logical abstractions representing an error type or error source within
the device. Currently, only error counter nodes is supported.
Drivers are responsible for registering and unregistering nodes via the
`drm_ras_node_register()` and `drm_ras_node_unregister()` APIs.
Node Management
-------------------
.. kernel-doc:: drivers/gpu/drm/drm_ras.c
:doc: DRM RAS Node Management
.. kernel-doc:: drivers/gpu/drm/drm_ras.c
:internal:
Generic Netlink Usage
=====================
The interface is implemented as a Generic Netlink family named ``drm-ras``.
User space tools can:
* List registered nodes with the ``list-nodes`` command.
* List all error counters in an node with the ``get-error-counter`` command with ``node-id``
as a parameter.
* Query specific error counter values with the ``get-error-counter`` command, using both
``node-id`` and ``error-id`` as parameters.
* Clear specific error counters with the ``clear-error-counter`` command, using both
``node-id`` and ``error-id`` as parameters.
YAML-based Interface
--------------------
The interface is described in a YAML specification ``Documentation/netlink/specs/drm_ras.yaml``
This YAML is used to auto-generate user space bindings via
``tools/net/ynl/pyynl/ynl_gen_c.py``, and drives the structure of netlink
attributes and operations.
Usage Notes
-----------
Annotation
- Atlas domain: Support Tooling And Documentation / Documentation.
- Implementation status: atlas-only.
Implementation Notes
- This generated page is the file-by-file coverage layer; curated subsystem chapters should link here when they synthesize a multi-file control flow.
- Core OS pages should be promoted from atlas-only to deep-reviewed when they explain data structures, invariants, locking, lifecycle, and C implementation snippets.
- Driver-family pages are intentionally pattern-oriented unless they are part of the selected PCIe/NVMe representative device path.