Documentation/admin-guide/device-mapper/log-writes.rst

Source file repositories/reference/linux-study-clean/Documentation/admin-guide/device-mapper/log-writes.rst

File Facts

System
Linux kernel
Corpus path
Documentation/admin-guide/device-mapper/log-writes.rst
Extension
.rst
Size
5170 bytes
Lines
146
Domain
Support Tooling And Documentation
Bucket
Documentation
Inferred role
Support Tooling And Documentation: documentation
Status
atlas-only

Why This File Exists

Repository support layer: documentation, build tooling, samples, user-space helper tools, generated initramfs support, licenses, and validation utilities.

Dependency Surface

Detected Declarations

Annotated Snippet

=============
dm-log-writes
=============

This target takes 2 devices, one to pass all IO to normally, and one to log all
of the write operations to.  This is intended for file system developers wishing
to verify the integrity of metadata or data as the file system is written to.
There is a log_write_entry written for every WRITE request and the target is
able to take arbitrary data from userspace to insert into the log.  The data
that is in the WRITE requests is copied into the log to make the replay happen
exactly as it happened originally.

Log Ordering
============

We log things in order of completion once we are sure the write is no longer in
cache.  This means that normal WRITE requests are not actually logged until the
next REQ_PREFLUSH request.  This is to make it easier for userspace to replay
the log in a way that correlates to what is on disk and not what is in cache,
to make it easier to detect improper waiting/flushing.

This works by attaching all WRITE requests to a list once the write completes.
Once we see a REQ_PREFLUSH request we splice this list onto the request and once
the FLUSH request completes we log all of the WRITEs and then the FLUSH.  Only
completed WRITEs, at the time the REQ_PREFLUSH is issued, are added in order to
simulate the worst case scenario with regard to power failures.  Consider the
following example (W means write, C means complete):

	W1,W2,W3,C3,C2,Wflush,C1,Cflush

The log would show the following:

	W3,W2,flush,W1....

Again this is to simulate what is actually on disk, this allows us to detect
cases where a power failure at a particular point in time would create an
inconsistent file system.

Any REQ_FUA requests bypass this flushing mechanism and are logged as soon as
they complete as those requests will obviously bypass the device cache.

Any REQ_OP_DISCARD requests are treated like WRITE requests.  Otherwise we would
have all the DISCARD requests, and then the WRITE requests and then the FLUSH
request.  Consider the following example:

	WRITE block 1, DISCARD block 1, FLUSH

If we logged DISCARD when it completed, the replay would look like this:

	DISCARD 1, WRITE 1, FLUSH

which isn't quite what happened and wouldn't be caught during the log replay.

Target interface
================

i) Constructor

   log-writes <dev_path> <log_dev_path>

   ============= ==============================================
   dev_path	 Device that all of the IO will go to normally.
   log_dev_path  Device where the log entries are written to.
   ============= ==============================================

ii) Status

    <#logged entries> <highest allocated sector>

    =========================== ========================

Annotation

Implementation Notes