241 lines
12 KiB
ReStructuredText
241 lines
12 KiB
ReStructuredText
|
.. SPDX-License-Identifier: GPL-2.0
|
||
|
|
||
|
==========================
|
||
|
PAT (Page Attribute Table)
|
||
|
==========================
|
||
|
|
||
|
x86 Page Attribute Table (PAT) allows for setting the memory attribute at the
|
||
|
page level granularity. PAT is complementary to the MTRR settings which allows
|
||
|
for setting of memory types over physical address ranges. However, PAT is
|
||
|
more flexible than MTRR due to its capability to set attributes at page level
|
||
|
and also due to the fact that there are no hardware limitations on number of
|
||
|
such attribute settings allowed. Added flexibility comes with guidelines for
|
||
|
not having memory type aliasing for the same physical memory with multiple
|
||
|
virtual addresses.
|
||
|
|
||
|
PAT allows for different types of memory attributes. The most commonly used
|
||
|
ones that will be supported at this time are:
|
||
|
|
||
|
=== ==============
|
||
|
WB Write-back
|
||
|
UC Uncached
|
||
|
WC Write-combined
|
||
|
WT Write-through
|
||
|
UC- Uncached Minus
|
||
|
=== ==============
|
||
|
|
||
|
|
||
|
PAT APIs
|
||
|
========
|
||
|
|
||
|
There are many different APIs in the kernel that allows setting of memory
|
||
|
attributes at the page level. In order to avoid aliasing, these interfaces
|
||
|
should be used thoughtfully. Below is a table of interfaces available,
|
||
|
their intended usage and their memory attribute relationships. Internally,
|
||
|
these APIs use a reserve_memtype()/free_memtype() interface on the physical
|
||
|
address range to avoid any aliasing.
|
||
|
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| API | RAM | ACPI,... | Reserved/Holes |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| ioremap | -- | UC- | UC- |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| ioremap_cache | -- | WB | WB |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| ioremap_uc | -- | UC | UC |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| ioremap_wc | -- | -- | WC |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| ioremap_wt | -- | -- | WT |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| set_memory_uc, | UC- | -- | -- |
|
||
|
| set_memory_wb | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| set_memory_wc, | WC | -- | -- |
|
||
|
| set_memory_wb | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| set_memory_wt, | WT | -- | -- |
|
||
|
| set_memory_wb | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| pci sysfs resource | -- | -- | UC- |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| pci sysfs resource_wc | -- | -- | WC |
|
||
|
| is IORESOURCE_PREFETCH | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| pci proc | -- | -- | UC- |
|
||
|
| !PCIIOC_WRITE_COMBINE | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| pci proc | -- | -- | WC |
|
||
|
| PCIIOC_WRITE_COMBINE | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- |
|
||
|
| read-write | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| /dev/mem | -- | UC- | UC- |
|
||
|
| mmap SYNC flag | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| /dev/mem | -- | WB/WC/UC- | WB/WC/UC- |
|
||
|
| mmap !SYNC flag | | | |
|
||
|
| and | |(from existing| (from existing |
|
||
|
| any alias to this area | |alias) | alias) |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| /dev/mem | -- | WB | WB |
|
||
|
| mmap !SYNC flag | | | |
|
||
|
| no alias to this area | | | |
|
||
|
| and | | | |
|
||
|
| MTRR says WB | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
| /dev/mem | -- | -- | UC- |
|
||
|
| mmap !SYNC flag | | | |
|
||
|
| no alias to this area | | | |
|
||
|
| and | | | |
|
||
|
| MTRR says !WB | | | |
|
||
|
+------------------------+----------+--------------+------------------+
|
||
|
|
||
|
|
||
|
Advanced APIs for drivers
|
||
|
=========================
|
||
|
|
||
|
A. Exporting pages to users with remap_pfn_range, io_remap_pfn_range,
|
||
|
vmf_insert_pfn.
|
||
|
|
||
|
Drivers wanting to export some pages to userspace do it by using mmap
|
||
|
interface and a combination of:
|
||
|
|
||
|
1) pgprot_noncached()
|
||
|
2) io_remap_pfn_range() or remap_pfn_range() or vmf_insert_pfn()
|
||
|
|
||
|
With PAT support, a new API pgprot_writecombine is being added. So, drivers can
|
||
|
continue to use the above sequence, with either pgprot_noncached() or
|
||
|
pgprot_writecombine() in step 1, followed by step 2.
|
||
|
|
||
|
In addition, step 2 internally tracks the region as UC or WC in memtype
|
||
|
list in order to ensure no conflicting mapping.
|
||
|
|
||
|
Note that this set of APIs only works with IO (non RAM) regions. If driver
|
||
|
wants to export a RAM region, it has to do set_memory_uc() or set_memory_wc()
|
||
|
as step 0 above and also track the usage of those pages and use set_memory_wb()
|
||
|
before the page is freed to free pool.
|
||
|
|
||
|
MTRR effects on PAT / non-PAT systems
|
||
|
=====================================
|
||
|
|
||
|
The following table provides the effects of using write-combining MTRRs when
|
||
|
using ioremap*() calls on x86 for both non-PAT and PAT systems. Ideally
|
||
|
mtrr_add() usage will be phased out in favor of arch_phys_wc_add() which will
|
||
|
be a no-op on PAT enabled systems. The region over which a arch_phys_wc_add()
|
||
|
is made, should already have been ioremapped with WC attributes or PAT entries,
|
||
|
this can be done by using ioremap_wc() / set_memory_wc(). Devices which
|
||
|
combine areas of IO memory desired to remain uncacheable with areas where
|
||
|
write-combining is desirable should consider use of ioremap_uc() followed by
|
||
|
set_memory_wc() to white-list effective write-combined areas. Such use is
|
||
|
nevertheless discouraged as the effective memory type is considered
|
||
|
implementation defined, yet this strategy can be used as last resort on devices
|
||
|
with size-constrained regions where otherwise MTRR write-combining would
|
||
|
otherwise not be effective.
|
||
|
::
|
||
|
|
||
|
==== ======= === ========================= =====================
|
||
|
MTRR Non-PAT PAT Linux ioremap value Effective memory type
|
||
|
==== ======= === ========================= =====================
|
||
|
PAT Non-PAT | PAT
|
||
|
|PCD |
|
||
|
||PWT |
|
||
|
||| |
|
||
|
WC 000 WB _PAGE_CACHE_MODE_WB WC | WC
|
||
|
WC 001 WC _PAGE_CACHE_MODE_WC WC* | WC
|
||
|
WC 010 UC- _PAGE_CACHE_MODE_UC_MINUS WC* | UC
|
||
|
WC 011 UC _PAGE_CACHE_MODE_UC UC | UC
|
||
|
==== ======= === ========================= =====================
|
||
|
|
||
|
(*) denotes implementation defined and is discouraged
|
||
|
|
||
|
.. note:: -- in the above table mean "Not suggested usage for the API". Some
|
||
|
of the --'s are strictly enforced by the kernel. Some others are not really
|
||
|
enforced today, but may be enforced in future.
|
||
|
|
||
|
For ioremap and pci access through /sys or /proc - The actual type returned
|
||
|
can be more restrictive, in case of any existing aliasing for that address.
|
||
|
For example: If there is an existing uncached mapping, a new ioremap_wc can
|
||
|
return uncached mapping in place of write-combine requested.
|
||
|
|
||
|
set_memory_[uc|wc|wt] and set_memory_wb should be used in pairs, where driver
|
||
|
will first make a region uc, wc or wt and switch it back to wb after use.
|
||
|
|
||
|
Over time writes to /proc/mtrr will be deprecated in favor of using PAT based
|
||
|
interfaces. Users writing to /proc/mtrr are suggested to use above interfaces.
|
||
|
|
||
|
Drivers should use ioremap_[uc|wc] to access PCI BARs with [uc|wc] access
|
||
|
types.
|
||
|
|
||
|
Drivers should use set_memory_[uc|wc|wt] to set access type for RAM ranges.
|
||
|
|
||
|
|
||
|
PAT debugging
|
||
|
=============
|
||
|
|
||
|
With CONFIG_DEBUG_FS enabled, PAT memtype list can be examined by::
|
||
|
|
||
|
# mount -t debugfs debugfs /sys/kernel/debug
|
||
|
# cat /sys/kernel/debug/x86/pat_memtype_list
|
||
|
PAT memtype list:
|
||
|
uncached-minus @ 0x7fadf000-0x7fae0000
|
||
|
uncached-minus @ 0x7fb19000-0x7fb1a000
|
||
|
uncached-minus @ 0x7fb1a000-0x7fb1b000
|
||
|
uncached-minus @ 0x7fb1b000-0x7fb1c000
|
||
|
uncached-minus @ 0x7fb1c000-0x7fb1d000
|
||
|
uncached-minus @ 0x7fb1d000-0x7fb1e000
|
||
|
uncached-minus @ 0x7fb1e000-0x7fb25000
|
||
|
uncached-minus @ 0x7fb25000-0x7fb26000
|
||
|
uncached-minus @ 0x7fb26000-0x7fb27000
|
||
|
uncached-minus @ 0x7fb27000-0x7fb28000
|
||
|
uncached-minus @ 0x7fb28000-0x7fb2e000
|
||
|
uncached-minus @ 0x7fb2e000-0x7fb2f000
|
||
|
uncached-minus @ 0x7fb2f000-0x7fb30000
|
||
|
uncached-minus @ 0x7fb31000-0x7fb32000
|
||
|
uncached-minus @ 0x80000000-0x90000000
|
||
|
|
||
|
This list shows physical address ranges and various PAT settings used to
|
||
|
access those physical address ranges.
|
||
|
|
||
|
Another, more verbose way of getting PAT related debug messages is with
|
||
|
"debugpat" boot parameter. With this parameter, various debug messages are
|
||
|
printed to dmesg log.
|
||
|
|
||
|
PAT Initialization
|
||
|
==================
|
||
|
|
||
|
The following table describes how PAT is initialized under various
|
||
|
configurations. The PAT MSR must be updated by Linux in order to support WC
|
||
|
and WT attributes. Otherwise, the PAT MSR has the value programmed in it
|
||
|
by the firmware. Note, Xen enables WC attribute in the PAT MSR for guests.
|
||
|
|
||
|
==== ===== ========================== ========= =======
|
||
|
MTRR PAT Call Sequence PAT State PAT MSR
|
||
|
==== ===== ========================== ========= =======
|
||
|
E E MTRR -> PAT init Enabled OS
|
||
|
E D MTRR -> PAT init Disabled -
|
||
|
D E MTRR -> PAT disable Disabled BIOS
|
||
|
D D MTRR -> PAT disable Disabled -
|
||
|
- np/E PAT -> PAT disable Disabled BIOS
|
||
|
- np/D PAT -> PAT disable Disabled -
|
||
|
E !P/E MTRR -> PAT init Disabled BIOS
|
||
|
D !P/E MTRR -> PAT disable Disabled BIOS
|
||
|
!M !P/E MTRR stub -> PAT disable Disabled BIOS
|
||
|
==== ===== ========================== ========= =======
|
||
|
|
||
|
Legend
|
||
|
|
||
|
========= =======================================
|
||
|
E Feature enabled in CPU
|
||
|
D Feature disabled/unsupported in CPU
|
||
|
np "nopat" boot option specified
|
||
|
!P CONFIG_X86_PAT option unset
|
||
|
!M CONFIG_MTRR option unset
|
||
|
Enabled PAT state set to enabled
|
||
|
Disabled PAT state set to disabled
|
||
|
OS PAT initializes PAT MSR with OS setting
|
||
|
BIOS PAT keeps PAT MSR with BIOS setting
|
||
|
========= =======================================
|
||
|
|