arch/alpha/lib/ev6-copy_page.S
Source file repositories/reference/linux-study-clean/arch/alpha/lib/ev6-copy_page.S
File Facts
- System
- Linux kernel
- Corpus path
arch/alpha/lib/ev6-copy_page.S- Extension
.S- Size
- 4343 bytes
- Lines
- 206
- Domain
- Architecture Layer
- Bucket
- arch/alpha
- Inferred role
- Architecture Layer: exported/initcall integration point
- Status
- integration implementation candidate
Why This File Exists
CPU and platform-specific kernel glue: boot entry, traps, syscall entry, interrupts, page tables, context switch, and low-level barriers.
- CPU and platform-specific kernel glue: boot entry, traps, syscall entry, interrupts, page tables, context switch, and low-level barriers.
- Exports symbols or registers init work; inspect boot/module ordering and who consumes the exported contract.
Dependency Surface
linux/export.h
Detected Declarations
export copy_page
Annotated Snippet
was written by an unnamed ev6 hardware designer and forwarded to me
via Steven Hobbs <hobbs@steven.zko.dec.com>.
First Problem: STQ overflows.
-----------------------------
It would be nice if EV6 handled every resource overflow efficiently,
but for some it doesn't. Including store queue overflows. It causes
a trap and a restart of the pipe.
To get around this we sometimes use (to borrow a term from a VSSAD
researcher) "aeration". The idea is to slow the rate at which the
processor receives valid instructions by inserting nops in the fetch
path. In doing so, you can prevent the overflow and actually make
the code run faster. You can, of course, take advantage of the fact
that the processor can fetch at most 4 aligned instructions per cycle.
I inserted enough nops to force it to take 10 cycles to fetch the
loop code. In theory, EV6 should be able to execute this loop in
9 cycles but I was not able to get it to run that fast -- the initial
conditions were such that I could not reach this optimum rate on
(chaotic) EV6. I wrote the code such that everything would issue
in order.
Second Problem: Dcache index matches.
-------------------------------------
If you are going to use this routine on random aligned pages, there
is a 25% chance that the pages will be at the same dcache indices.
This results in many nasty memory traps without care.
The solution is to schedule the prefetches to avoid the memory
conflicts. I schedule the wh64 prefetches farther ahead of the
read prefetches to avoid this problem.
Third Problem: Needs more prefetching.
--------------------------------------
In order to improve the code I added deeper prefetching to take the
most advantage of EV6's bandwidth.
I also prefetched the read stream. Note that adding the read prefetch
forced me to add another cycle to the inner-most kernel - up to 11
from the original 8 cycles per iteration. We could improve performance
further by unrolling the loop and doing multiple prefetches per cycle.
I think that the code below will be very robust and fast code for the
purposes of copying aligned pages. It is slower when both source and
destination pages are in the dcache, but it is my guess that this is
less important than the dcache miss case. */
#include <linux/export.h>
.text
.align 4
.global copy_page
.ent copy_page
copy_page:
.prologue 0
/* Prefetch 5 read cachelines; write-hint 10 cache lines. */
wh64 ($16)
ldl $31,0($17)
ldl $31,64($17)
lda $1,1*64($16)
wh64 ($1)
ldl $31,128($17)
ldl $31,192($17)
lda $1,2*64($16)
Annotation
- Immediate include surface: `linux/export.h`.
- Detected declarations: `export copy_page`.
- Atlas domain: Architecture Layer / arch/alpha.
- Implementation status: integration implementation candidate.
Implementation Notes
- This generated page is the file-by-file coverage layer; curated subsystem chapters should link here when they synthesize a multi-file control flow.
- Core OS pages should be promoted from atlas-only to deep-reviewed when they explain data structures, invariants, locking, lifecycle, and C implementation snippets.
- Driver-family pages are intentionally pattern-oriented unless they are part of the selected PCIe/NVMe representative device path.