231 lines
10 KiB
ReStructuredText
231 lines
10 KiB
ReStructuredText
=======================
|
|
Display Core Next (DCN)
|
|
=======================
|
|
|
|
To equip our readers with the basic knowledge of how AMD Display Core Next
|
|
(DCN) works, we need to start with an overview of the hardware pipeline. Below
|
|
you can see a picture that provides a DCN overview, keep in mind that this is a
|
|
generic diagram, and we have variations per ASIC.
|
|
|
|
.. kernel-figure:: dc_pipeline_overview.svg
|
|
|
|
Based on this diagram, we can pass through each block and briefly describe
|
|
them:
|
|
|
|
* **Display Controller Hub (DCHUB)**: This is the gateway between the Scalable
|
|
Data Port (SDP) and DCN. This component has multiple features, such as memory
|
|
arbitration, rotation, and cursor manipulation.
|
|
|
|
* **Display Pipe and Plane (DPP)**: This block provides pre-blend pixel
|
|
processing such as color space conversion, linearization of pixel data, tone
|
|
mapping, and gamut mapping.
|
|
|
|
* **Multiple Pipe/Plane Combined (MPC)**: This component performs blending of
|
|
multiple planes, using global or per-pixel alpha.
|
|
|
|
* **Output Pixel Processing (OPP)**: Process and format pixels to be sent to
|
|
the display.
|
|
|
|
* **Output Pipe Timing Combiner (OPTC)**: It generates time output to combine
|
|
streams or divide capabilities. CRC values are generated in this block.
|
|
|
|
* **Display Output (DIO)**: Codify the output to the display connected to our
|
|
GPU.
|
|
|
|
* **Display Writeback (DWB)**: It provides the ability to write the output of
|
|
the display pipe back to memory as video frames.
|
|
|
|
* **Multi-Media HUB (MMHUBBUB)**: Memory controller interface for DMCUB and DWB
|
|
(Note that DWB is not hooked yet).
|
|
|
|
* **DCN Management Unit (DMU)**: It provides registers with access control and
|
|
interrupts the controller to the SOC host interrupt unit. This block includes
|
|
the Display Micro-Controller Unit - version B (DMCUB), which is handled via
|
|
firmware.
|
|
|
|
* **DCN Clock Generator Block (DCCG)**: It provides the clocks and resets
|
|
for all of the display controller clock domains.
|
|
|
|
* **Azalia (AZ)**: Audio engine.
|
|
|
|
The above diagram is an architecture generalization of DCN, which means that
|
|
every ASIC has variations around this base model. Notice that the display
|
|
pipeline is connected to the Scalable Data Port (SDP) via DCHUB; you can see
|
|
the SDP as the element from our Data Fabric that feeds the display pipe.
|
|
|
|
Always approach the DCN architecture as something flexible that can be
|
|
configured and reconfigured in multiple ways; in other words, each block can be
|
|
setup or ignored accordingly with userspace demands. For example, if we
|
|
want to drive an 8k@60Hz with a DSC enabled, our DCN may require 4 DPP and 2
|
|
OPP. It is DC's responsibility to drive the best configuration for each
|
|
specific scenario. Orchestrate all of these components together requires a
|
|
sophisticated communication interface which is highlighted in the diagram by
|
|
the edges that connect each block; from the chart, each connection between
|
|
these blocks represents:
|
|
|
|
1. Pixel data interface (red): Represents the pixel data flow;
|
|
2. Global sync signals (green): It is a set of synchronization signals composed
|
|
by VStartup, VUpdate, and VReady;
|
|
3. Config interface: Responsible to configure blocks;
|
|
4. Sideband signals: All other signals that do not fit the previous one.
|
|
|
|
These signals are essential and play an important role in DCN. Nevertheless,
|
|
the Global Sync deserves an extra level of detail described in the next
|
|
section.
|
|
|
|
All of these components are represented by a data structure named dc_state.
|
|
From DCHUB to MPC, we have a representation called dc_plane; from MPC to OPTC,
|
|
we have dc_stream, and the output (DIO) is handled by dc_link. Keep in mind
|
|
that HUBP accesses a surface using a specific format read from memory, and our
|
|
dc_plane should work to convert all pixels in the plane to something that can
|
|
be sent to the display via dc_stream and dc_link.
|
|
|
|
Front End and Back End
|
|
----------------------
|
|
|
|
Display pipeline can be broken down into two components that are usually
|
|
referred as **Front End (FE)** and **Back End (BE)**, where FE consists of:
|
|
|
|
* DCHUB (Mainly referring to a subcomponent named HUBP)
|
|
* DPP
|
|
* MPC
|
|
|
|
On the other hand, BE consist of
|
|
|
|
* OPP
|
|
* OPTC
|
|
* DIO (DP/HDMI stream encoder and link encoder)
|
|
|
|
OPP and OPTC are two joining blocks between FE and BE. On a side note, this is
|
|
a one-to-one mapping of the link encoder to PHY, but we can configure the DCN
|
|
to choose which link encoder to connect to which PHY. FE's main responsibility
|
|
is to change, blend and compose pixel data, while BE's job is to frame a
|
|
generic pixel stream to a specific display's pixel stream.
|
|
|
|
Data Flow
|
|
---------
|
|
|
|
Initially, data is passed in from VRAM through Data Fabric (DF) in native pixel
|
|
formats. Such data format stays through till HUBP in DCHUB, where HUBP unpacks
|
|
different pixel formats and outputs them to DPP in uniform streams through 4
|
|
channels (1 for alpha + 3 for colors).
|
|
|
|
The Converter and Cursor (CNVC) in DPP would then normalize the data
|
|
representation and convert them to a DCN specific floating-point format (i.e.,
|
|
different from the IEEE floating-point format). In the process, CNVC also
|
|
applies a degamma function to transform the data from non-linear to linear
|
|
space to relax the floating-point calculations following. Data would stay in
|
|
this floating-point format from DPP to OPP.
|
|
|
|
Starting OPP, because color transformation and blending have been completed
|
|
(i.e alpha can be dropped), and the end sinks do not require the precision and
|
|
dynamic range that floating points provide (i.e. all displays are in integer
|
|
depth format), bit-depth reduction/dithering would kick in. In OPP, we would
|
|
also apply a regamma function to introduce the gamma removed earlier back.
|
|
Eventually, we output data in integer format at DIO.
|
|
|
|
AMD Hardware Pipeline
|
|
---------------------
|
|
|
|
When discussing graphics on Linux, the **pipeline** term can sometimes be
|
|
overloaded with multiple meanings, so it is important to define what we mean
|
|
when we say **pipeline**. In the DCN driver, we use the term **hardware
|
|
pipeline** or **pipeline** or just **pipe** as an abstraction to indicate a
|
|
sequence of DCN blocks instantiated to address some specific configuration. DC
|
|
core treats DCN blocks as individual resources, meaning we can build a pipeline
|
|
by taking resources for all individual hardware blocks to compose one pipeline.
|
|
In actuality, we can't connect an arbitrary block from one pipe to a block from
|
|
another pipe; they are routed linearly, except for DSC, which can be
|
|
arbitrarily assigned as needed. We have this pipeline concept for trying to
|
|
optimize bandwidth utilization.
|
|
|
|
.. kernel-figure:: pipeline_4k_no_split.svg
|
|
|
|
Additionally, let's take a look at parts of the DTN log (see
|
|
'Documentation/gpu/amdgpu/display/dc-debug.rst' for more information) since
|
|
this log can help us to see part of this pipeline behavior in real-time::
|
|
|
|
HUBP: format addr_hi width height ...
|
|
[ 0]: 8h 81h 3840 2160
|
|
[ 1]: 0h 0h 0 0
|
|
[ 2]: 0h 0h 0 0
|
|
[ 3]: 0h 0h 0 0
|
|
[ 4]: 0h 0h 0 0
|
|
...
|
|
MPCC: OPP DPP ...
|
|
[ 0]: 0h 0h ...
|
|
|
|
The first thing to notice from the diagram and DTN log it is the fact that we
|
|
have different clock domains for each part of the DCN blocks. In this example,
|
|
we have just a single **pipeline** where the data flows from DCHUB to DIO, as
|
|
we intuitively expect. Nonetheless, DCN is flexible, as mentioned before, and
|
|
we can split this single pipe differently, as described in the below diagram:
|
|
|
|
.. kernel-figure:: pipeline_4k_split.svg
|
|
|
|
Now, if we inspect the DTN log again we can see some interesting changes::
|
|
|
|
HUBP: format addr_hi width height ...
|
|
[ 0]: 8h 81h 1920 2160 ...
|
|
...
|
|
[ 4]: 0h 0h 0 0 ...
|
|
[ 5]: 8h 81h 1920 2160 ...
|
|
...
|
|
MPCC: OPP DPP ...
|
|
[ 0]: 0h 0h ...
|
|
[ 5]: 0h 5h ...
|
|
|
|
From the above example, we now split the display pipeline into two vertical
|
|
parts of 1920x2160 (i.e., 3440x2160), and as a result, we could reduce the
|
|
clock frequency in the DPP part. This is not only useful for saving power but
|
|
also to better handle the required throughput. The idea to keep in mind here is
|
|
that the pipe configuration can vary a lot according to the display
|
|
configuration, and it is the DML's responsibility to set up all required
|
|
configuration parameters for multiple scenarios supported by our hardware.
|
|
|
|
Global Sync
|
|
-----------
|
|
|
|
Many DCN registers are double buffered, most importantly the surface address.
|
|
This allows us to update DCN hardware atomically for page flips, as well as
|
|
for most other updates that don't require enabling or disabling of new pipes.
|
|
|
|
(Note: There are many scenarios when DC will decide to reserve extra pipes
|
|
in order to support outputs that need a very high pixel clock, or for
|
|
power saving purposes.)
|
|
|
|
These atomic register updates are driven by global sync signals in DCN. In
|
|
order to understand how atomic updates interact with DCN hardware, and how DCN
|
|
signals page flip and vblank events it is helpful to understand how global sync
|
|
is programmed.
|
|
|
|
Global sync consists of three signals, VSTARTUP, VUPDATE, and VREADY. These are
|
|
calculated by the Display Mode Library - DML (drivers/gpu/drm/amd/display/dc/dml)
|
|
based on a large number of parameters and ensure our hardware is able to feed
|
|
the DCN pipeline without underflows or hangs in any given system configuration.
|
|
The global sync signals always happen during VBlank, are independent from the
|
|
VSync signal, and do not overlap each other.
|
|
|
|
VUPDATE is the only signal that is of interest to the rest of the driver stack
|
|
or userspace clients as it signals the point at which hardware latches to
|
|
atomically programmed (i.e. double buffered) registers. Even though it is
|
|
independent of the VSync signal we use VUPDATE to signal the VSync event as it
|
|
provides the best indication of how atomic commits and hardware interact.
|
|
|
|
Since DCN hardware is double-buffered the DC driver is able to program the
|
|
hardware at any point during the frame.
|
|
|
|
The below picture illustrates the global sync signals:
|
|
|
|
.. kernel-figure:: global_sync_vblank.svg
|
|
|
|
These signals affect core DCN behavior. Programming them incorrectly will lead
|
|
to a number of negative consequences, most of them quite catastrophic.
|
|
|
|
The following picture shows how global sync allows for a mailbox style of
|
|
updates, i.e. it allows for multiple re-configurations between VUpdate
|
|
events where only the last configuration programmed before the VUpdate signal
|
|
becomes effective.
|
|
|
|
.. kernel-figure:: config_example.svg
|