# Radeon Evergreen/Northern Islands Acceleration

#### Trademarks

AMD, the AMD Arrow logo, Athlon, and combinations thereof, ATI, ATI logo, Radeon, and Crossfire are trademarks of Advanced Micro Devices, Inc.

Microsoft and Windows are registered trademarks of Microsoft Corporation.

Other product names used in this publication are for identification purposes only and may be trademarks of their respective companies.

#### Disclaimer

The contents of this document are provided in connection with Advanced Micro Devices, Inc. ("AMD") products. AMD makes no representations or warranties with respect to the accuracy or completeness of the contents of this publication and reserves the right to make changes to specifications and product descriptions at any time without notice. No license, whether express, implied, arising by estoppel, or otherwise, to any intellectual property rights are granted by this publication. Except as set forth in AMD's Standard Terms and Conditions of Sale, AMD assumes no liability whatsoever, and disclaims any express or implied warranty, relating to its products including, but not limited to, the implied warranty of merchantability, fitness for a particular purpose, or infringement of any intellectual property right. AMD's products are not designed, intended, authorized or warranted for use as components in systems intended for surgical implant into the body, or in other applications intended to support or sustain life, or in any other application in which the failure of AMD's product could create a situation where personal injury, death, or severe property or environmental damage may occur. AMD reserves the right to discontinue or make changes to its products at any time without notice.

© 2011 Advanced Micro Devices, Inc. All rights reserved.

| 1. | INTRODUCTION                                        | 5  |
|----|-----------------------------------------------------|----|
| 2. | SCISSORS                                            | 6  |
| 2. | .1 Overview                                         | 6  |
| 2. | .2 Scissor Rectangles                               | 6  |
| 2. | .3 Clip Rectangles (Auxiliary Scissor)              | 6  |
| 3. | COMPUTE SHADERS                                     | 7  |
| 3. | 0.1 Overview                                        | 7  |
| 3. | .2 State Requirements for Compute                   | 7  |
| 4. | CAYMAN SHADER CHANGES                               |    |
| 4. | .1 Changes from Evergreen to Cayman                 | 10 |
| 4. | .2 ALU.TRANS CHANGES                                |    |
| 5. | UNIFIED INTERPOLATION                               |    |
| 5. | .1 R7XX Starting Condition                          | 12 |
| 5. | .2 Evergreen/Cayman Starting Condition              |    |
| 5. | .3 MAPPING FROM R7xx TO EVERGREEN/CAYMAN            | 16 |
| 6. | DB PROGRAMMING                                      |    |
| 6. | .1 Compressed Depth/Stencil Textures (DSTs)         |    |
| 6. | .2 TURNING OFF THE SHADER FOR DEPTH ONLY RENDERING  | 19 |
| 7. | HIERARCHICAL Z (HIZ) AND HIERARCHICAL STENCIL (HIS) | 20 |
| 7. | .1 Relevant Registers                               | 20 |
| 7. | .2 DRIVER HINTS TO THE HARDWARE                     | 20 |
| 7. | .3 HTILE BUFFERS                                    | 20 |
| 7. | .4 HIZ                                              | 21 |
| 7. | .5 HIS                                              | 21 |
| 8. | CB PROGRAMMING                                      | 22 |
| 8. | .1 NORMAL OPERATION                                 | 22 |
| 8. | .2 DUAL-SOURCE MODE                                 | 23 |
| 8. | .3 FAST CLEAR                                       | 23 |
| 8. | ELIMINATE FAST CLEAR/DECOMPRESS/ FMASK DECOMPRESS   | 24 |
| 8. | .5 Resolve                                          | 25 |
| 8. | COMPUTE SHADER                                      | 25 |
| 9. | PM4                                                 | 27 |
| 9. | .1 INTRODUCTION                                     | 27 |
| 9. | .2 INITIALIZATION PACKETS                           | 28 |
| 9. | 0.3 Command Buffer Packets                          |    |

| 9.4 | STATE MANAGEMENT PACKETS    | 35 |
|-----|-----------------------------|----|
| 9.5 | COMMAND PREDICATION PACKETS | 45 |
| 9.6 | Synchronization Packets     | 48 |
| 9.7 | MISC PACKETS                | 53 |

# 1. Introduction

This guide is targeted at those who are familiar with GPU programming and the Radeon programming model. It is recommended that you read the r6xx/r7xx programming guide first as this guide builds on the information in that one. Much of the information in this guide is relevant to previous ASICs as well and is noted where applicable.

# 2. Scissors

# 2.1 Overview

There are three 2D coordinate systems relevant for the scissors. All three coordinate systems have the x-axis pointing right and the y-axis pointing down.

- Hardware Screen Coordinates This coordinate system is defined by the SC number system definition. This coordinate system is the one in which the SC process is performed. The other coordinate systems are "located" within this coordinate system. The screen and window coordinate system can be offset by a programmable amount in the HW Screen coordinate system to allow a maximum amount of guard band for all legitimate window/screen sizes.
- 2) Screen Coordinates This coordinate system is typically only relevant when rendering a window into a primary surface (i.e. Front/Back buffer). When rendering to an off-screen buffer, typically window and screen are the same coordinate system. When rendering to a primary surface, this coordinate system is typically defined by the primary surface size and has a range of 0 to 16K-1. The origin (0,0) of screen coordinates is located at 0, 0 or a programmable amount in HW Screen coordinates. The window coordinate system typically is located within the screen coordinate system. The offset between the window coordinates and the screen coordinates is defined by a "Window Offset" register controllable by the driver.
- 3) Window Coordinates This coordinate system is defined by the output of the viewport transform. Typically, 0,0 represents the upper left corner of the visible window and Xmax, Ymax represents the lower right corner of the visible window. Coordinates may range less than 0 and greater than Xmax,Ymax due to the clipping guard band.

# 2.2 Scissor Rectangles

There are four scissor rectangles supported by evergreen. The first is termed the SCREEN scissor because it is specified in screen coordinates. The second is the WINDOW scissor because it is specified in window coordinates and it can be (conditionally) offset by the window offset and ranges from 0 to 16K for right and bottom and from 0 to 16K-1 for left and top. The offset will be applied if the state register,

PA\_SC\_WINDOW\_SCISSOR\_TL.WINDOW\_OFFSET\_DISABLE, is not set. The third scissor is the GENERIC scissor and it is also specified in window coordinates and it can also be (conditionally) offset by the window offset and ranges from 0 to 16K for right and bottom and from 0 to 16K-1 for left and top. The offset will be applied if the state register, PA\_SC\_GENERIC\_SCISSOR\_TL.WINDOW\_OFFSET\_DISABLE, is not set. The fourth scissor is an array of 16 scissors indexed by the viewport array index and it is called the VIEWPORT scissor and ranges from 0 to 16K for right and bottom and from 0 to 16K-1 from left and top. The viewport array index is generally an output of the Geometry Shader. The VIEWPORT scissor can also be offset (conditionally) by the window offset. The offset will be applied if the state register, PA\_SC\_VPORT\_SCISSOR\_\*\_TL.WINDOW\_OFFSET\_DISABLE. The VIEWPORT scissor can also be enabled/disabled by the state bit PA\_SC\_MODE\_CNTL.VPORT\_SCISSOR\_ENABLE.

The evergreen scissor rectangles are specified as an upper left x, y and a lower right x, y value in window coordinates. The scissor will be inclusive on LEFT and TOP and exclusive on RIGHT and BOTTOM (i.e. a scissor definition of UL 10,10 and LR 20,20 will draw row and column 10 and will discard row and column 20).

# 2.3 Clip Rectangles (Auxiliary Scissor)

There are 4 clip rectangles provided by Evergreen. Unlike the Scissor Rectangle, the clip rectangles can be programmed to discard based on included or excluded from the rectangle and may be programmed to form unions and intersections of the 4 clip rectangles. The determination of inside or outside of the rectangle is identical to that of the scissor rectangle (i.e. LEFT/TOP inclusive, RIGHT/BOTTOM exclusive).

# 3. Compute Shaders

# 3.1 Overview

Setting up the GPU for compute shaders is very similar to the setup for 3D graphics. When setting compute state, bit 1 in the PM4 packet 3 header needs to be set to 1 to denote a compute shader. To kick-off a compute thread, DISPATCH\_\* packets are used rather than DRAW\_\* packets.

For R7xx compute shaders are a special version of the ES shader. Compute inputs are constant buffers, vertex buffers, textures (ES resources), and global memory pointed by SX\_MEMORY\_EXPORT\_BASE. Outputs can go to global memory pointed by SX\_MEMORY\_EXPORT\_BASE or to the ESGS ring or STRMOUT buffers.

For evergreen/cayman compute shaders are a special CS shader type that runs on resources shared with LS shader. The inputs are constant buffers, vertex buffers, textures (LS resources), global memory pointed by SX\_MEMORY\_EXPORT\_BASE (on evergreen) and SX\_SCATTER\_EXPORT\_BASE (on cayman). The outputs can go to global memory or color buffers CB0 to CB11. CB9..11 do not have full color buffer capabilities and can be only used as RATs (Random Access Targets). RATs have capability to "read" the memory from it, in normal read commands or return a result in atomic command (see all the RAT opcodes with \_RET). Since there is no return path from CB to shader, each "read" command also sends the return address (offset within corresponding CB\_IMMED<n>\_BASE surface). When "read" happens, CB does read from the RT and writes to the IMMED surface, and signals end of the operation like it signals confirmation of a normal write. Shader can wait for this write confirmation and then issue read via TC to this location retrieving the "read" value.

SX memory exports use different instructions than CB RAT and can operate only on DWORDs where CB RATs can also work on bytes and shorts.

# 3.2 State Requirements for Compute

The GPU requires the same state setup for compute as for graphics, with the following differences:

1. Packet 3 header

Set bit 1 in the packet 3 header for compute state

- 2. CB
  - See the CB programming section below.
- 3. SX

Program SX\_MEMORY\_EXPORT\_BASE (R7xx/evergreen)/SX\_SCATTER\_EXPORT\_BASE (cayman) if using global memory exports

4. DB

Program DB\_RENDER\_CONTROL.COLOR\_DISABLE = 1

5. GDS (Global Data Share)

If using GDS, program GDS\_ORDERED\_WAVE\_PER\_SE.COUNT = 1, and GDS\_ADDR\_BASE, and GDS\_ADDR\_SIZE

6. SQ

Program SQ\_THREAD\_RESOURCE\_MGMT\_2.NUM\_LS\_THREADS and SQ\_STACK\_RESOURCE\_MGMT\_3.NUM\_LS\_STACK\_ENTRIES to allocate resources for compute shaders. Also program the LSTMP ring registers: SQ\_LSTMP\_RING\_BASE, SQ\_LSTMP\_RING\_SIZE. On multi-SE (shader engine) asics, the SQ rings are per-SE, so they need to be set separately for each SE. On single SE asics, one only needs to program the SQ rings once. Programming multi\_SE asics: GRBM\_GFX\_INDEX.INSTANCE\_INDEX = 0; GRBM\_GFX\_INDEX.SE\_INDEX = 0; GRBM\_GFX\_INDEX.INSTANCE\_BROADCAST\_WRITES = 1; GRBM\_GFX\_INDEX.SE\_BROADCAST\_WRITES = 0;

emit GRBM\_GFX\_INDEX

emit SQ\_LSTMP\_RING\_BASE, SQ\_LSTMP\_RING\_SIZE for SE0

GRBM\_GFX\_INDEX.SE\_INDEX = 1;

emit GRBM\_GFX\_INDEX emit SQ\_LSTMP\_RING\_BASE, SQ\_LSTMP\_RING\_SIZE for SE1

GRBM\_GFX\_INDEX.SE\_INDEX = 0; GRBM\_GFX\_INDEX.SE\_BROADCAST\_WRITES = 1;

emit GRBM\_GFX\_INDEX

#### 7. LDS (Local Data Store)

Program SQ\_LDS\_RESOURCE\_MGMT.NUM\_LS\_LDS, SQ\_LDS\_ALLOC.SIZE and SQ\_LDS\_ALLOC.HS\_NUM\_WAVES.

#### 8. VGT

Program the following VGT registers as follows: VGT\_GS\_MODE.MODE = GS\_OFF; VGT\_GS\_MODE.COMPUTE\_MODE = 1; VGT\_GS\_MODE.PARTIAL\_THD\_AT\_EOI = 1; VGT\_SHADER\_STAGES\_EN.LS\_EN = CS\_STAGE\_ON; VGT\_SHADER\_STAGES\_EN.HS\_EN = HS\_STAGE\_OFF; VGT\_SHADER\_STAGES\_EN.ES\_EN = ES\_STAGE\_OFF; VGT\_SHADER\_STAGES\_EN.GS\_EN = GS\_STAGE\_OFF; VGT\_SHADER\_STAGES\_EN.VS\_EN = VS\_STAGE\_REAL;

#### 9. PA

Program the following PA registers as follows:  $PA_SU_LINE_CNTL.WIDTH = 0;$ PA SU SC MODE CNTL.CULL BACK = 1; PA\_SU\_SC\_MODE\_CNTL.CULL\_FRONT = 1; PA\_SU\_SC\_MODE\_CNTL.FACE = 1; PA SU SC MODE CNTL.POLY MODE = 0; PA\_SU\_SC\_MODE\_CNTL.POLYMODE\_FRONT\_PTYPE = 2; PA\_SU\_SC\_MODE\_CNTL.POLYMODE\_BACK\_PTYPE = 2; PA\_SU\_SC\_MODE\_CNTL.POLY\_OFFSET\_BACK\_ENABLE = 0; PA\_SU\_SC\_MODE\_CNTL.POLY\_OFFSET\_FRONT\_ENABLE = 0; PA\_SU\_SC\_MODE\_CNTL.POLY\_OFFSET\_BACK\_ENABLE = 0; PA SU SC MODE CNTL.POLY OFFSET PARA ENABLE = 0; PA\_SU\_SC\_MODE\_CNTL.VTX\_WINDOW\_OFFSET\_ENABLE = 0; PA\_SU\_SC\_MODE\_CNTL.PROVOKING\_VTX\_LAST = 0; PA SU SC MODE CNTL.PERSP CORR DIS = 0; PA\_SU\_SC\_MODE\_CNTL.MULTI\_PRIM\_IB\_ENA = 0; PA SU POINT SIZE = 0; PA\_SU\_POINT\_MINMAX = 0;

#### 10. SPI

Program the following SPI registers as follows: SPI\_COMPUTE\_INPUT\_CNTL.DISABLE\_INDEX\_PACK = 0; SPI\_COMPUTE\_INPUT\_CNTL.TID\_IN\_GROUP\_ENA = 0; SPI\_COMPUTE\_INPUT\_CNTL.bits.TGID\_ENA = 0;

#### **11. Dispatch a Compute Thread**

Instances is the number of thread groups, indices is thread group size. Program the following registers: VGT\_PRIMITIVE\_TYPE.PRIM\_TYPE = DI\_PT\_POINTLIST; VGT\_COMPUTE\_START\_X = 0; VGT\_COMPUTE\_START\_Y = 0; VGT\_COMPUTE\_START\_Z = 0; SPI\_COMPUTE\_NUM\_THREAD\_X = indices SPI\_COMPUTE\_NUM\_THREAD\_Y = 1 SPI\_COMPUTE\_NUM\_THREAD\_Z = 1 VGT\_NUM\_INDICES = indices VGT\_COMPUTE\_THREAD\_GROUP\_SIZE = indices;

#### **DISPATCH\_DIRECT** packet:

DW0:Packet 3 header DW1: instances DW2: 1 DW3: 1 DW4: VGT\_DISPATCH\_INITIATOR.COMPUTE\_SHADER\_EN = 1 See the PM4 section for more on the DISPATCH\* packets.

# 4. Cayman Shader Changes

## 4.1 Changes from Evergreen to Cayman

This is a brief summary of the ISA changes from Evergreen to Cayman. See the Cayman and Evergreen ISA documents for further details.

- Vfetch instructions issue through TC.
- Scalar ALU slot (ALU.Trans) removed
- New CF\_INST\_PUSH\_WQM, CF\_INST\_POP\_WQM, CF\_INST\_ELSE\_WQM instructions, and removal of WHOLE\_QUAD\_MODE bit from certain instructions.
- Changes in stack allocation requirements.
- Restrictions on POP\_COUNT settings.
- END\_OF\_PROGRAM bit replaced with CF\_INST\_END.
- CF\_INST\_ALU\_BREAK and CF\_INST\_ALU\_CONTINUE are removed and replaced with new EXECUTE\_MASK\_OP modes.
- Added CF\_INST\_ALU\_VALID\_PIXEL\_MODE.
- Added CF\_INST\_REACTIVATE and CF\_INST\_ALU\_REACTIVATE\_BEFORE.
- The GROUP\_SEQ\* ALU synchronization instructions were removed.
- New CF\_INST\_JUMP\_ANY.
- Coalesced scatter reads and the option to write data to LDS.
- Coalesced and structured vfetch and the option to write data to LDS.
- Added MOVA\_DST.

## 4.2 ALU.Trans Changes

These evergreen t-slot only ops are implemented in all vector slots.

- MUL\_LIT
- FLT\_TO\_UINT
- INT\_TO\_FLT
- UINT\_TO\_FLT

These evergreen t-slot only opcodes become vector ops, with all four slots expecting the arguments on sources a and b. Result is broadcast to all channels.

- MULLO\_INT
- MULHI\_INT
- MULLO\_UINT
- MULHI\_UINT

These evergreen t-slot only opcodes become vector ops in the z, y, and x slots.

- EXP\_IEEE
- LOG\_IEEE/CLAMPED
- RECIP\_IEEE/CLAMPED/FF/INT/UINT/\_64/CLAMPED\_64
- RECIPSQRT\_IEEE/CLAMPED/FF/\_64/CLAMPED\_64
- SQRT\_IEEE/\_64

#### • SIN/COS

The w slot may have an independent co-issued operation, or if the result is required to be in the w slot, the opcode above may be issued in the w slot as well. The compiler must issue the source argument to slots z, y, and x.

# 5. Unified Interpolation

In R7xx and prior hardware, pixel shader input attribute interpolation (the interpolation of per-vertex attributes to the pixel locations) was done in dedicated interpolation hardware. The controls for this process defined how many attributes were to be interpolated and many unique controls on the method of interpolation for each attribute. The process was initiated once a pixel vector was prepared for the shader core. Prior to any execution of the pixel shader, the interpolation process would compute and write all of the input attributes into the pixel shader general-purpose registers (GPRs).

# 5.1 R7XX Starting Condition



**GPR Pixel N** 

The R7xx starting condition for the pixel shader is that all of the input attributes are already interpolated to the pixel center (or centroid or a given sample) and are present in the GPRs ready for use. This includes an additional set of data (not from the VS) such as screen-space position, front-face, barycentric parameters, fog terms, and a per-pixel index value.

# 5.2 Evergreen/Cayman Starting Condition

| -384b (3 x 128b (4 x 32bit Float or Fixed)) | 128b (4 x 32bit Float or Fixed) |
|---------------------------------------------|---------------------------------|
| Prim 0 Attribute 0 Vo, V1-V0, V2-V0         | I,J center / I,J centroid       |
| Prim 1 Attribute 0 Vo, V1-V0, V2-V0         | I, J persp / I, J linear        |
| Prim N Attribute 0 Vo, V1-V0, V2-V0         | I/W J/W 1.0/W                   |
| Prim 0 Attribute 1 Vo, V1-V0, V2-V0         | *                               |
| Prim 1 Attribute 1 Vo, V1-V0, V2-V0         | *                               |
| Prim N Attribute 1 Vo, V1-V0, V2-V0         | *                               |
| *                                           | GPR Pixel N                     |
| *                                           |                                 |

Local Data Store (LDS)

The Evergreen/Cayman starting condition for the pixel shader is that the GPRs contain the perspective-correct (and/or linear ) barycentric coordinates interpolated to the pixel center (and/or centroid and/or sample) along with potentially the same terms still containing w along with 1.0 / w at the pixel center (detail later on how these are used). The pixel shader then has access to the local data store (LDS), which is common storage available to all of the pixels of a given pixel vector. The LDS contains the vertex shader output attribute values as V0, V1-V0, and V2-V0 where V0 is the attribute value at the provoking vertex, V1 is the attribute value at one of the other vertices and V2 is the attribute value at the third vertex. The reason for providing the vertex data in a gradient form (i.e. V0 subtracted from V1 and V2) is that it makes them more applicable to the interpolation math equation (described later).

The terms which are made available to the pixel shader which did not come directly from the vertex shader (such as position in screen space, front-face status, barycentric parameters, per-pixel index value) will still be placed in the GPRs (i.e. NOT in the LDS) at the specified locations, similar to R7xx.

#### 5.2.1 <u>LDS Data</u>

Each pixel vector may use up to 33 attributes, 32 API specified values plus hardware generated ST term (**param gen**). Each pixel vector allocates enough space in the LDS for attribute data ((num\_interp + **param\_gen**) \* prim\_count\_for\_pixel\_vector \* 12 (the 12 is for 4 components per vector \* 3 terms (V0, V1-V0, V2-V0) per gradient), plus any extra LDS space the driver wants to use for the PS (register SQ\_LDS\_ALLOC\_PS). SQ\_LDS\_ALLOC\_PS is in units of 4 dwords to keep the PS params lined up on nice boundaries. Parameter data is written starting at lds\_base + SQ\_LDS\_ALLOC\_PS.

Each of the 32 API terms can

- be overridden with **default values** if no semantic match with VS outputs. The SPI will place the appropriate default values in V0 and gradients of 0 in the LDS
- be **flat shaded** if corresponding flat shade bit is set (and global flat shade enable is set). The SPI will place the provoking vertex values in V0 and gradients of 0 in the LDS

The first 20 API terms can

- be overridden as a **point sprite texture**.
- Support cylindrical wrapping.

**VS FOG** can be written to one of the first NUM\_INTERP-1 LDS locations. It is enabled by setting PASS\_FOG\_THROUGH\_PS, it uses VS\_EXPORTS\_FOG and VS\_OUT\_FOG\_VEC\_ADDR to know if/where fog is in the param cache, and the result is written to the LDS at FOG\_ADDR (range is 0 to NUM\_INTERP-1).

The Pixel Shader may reserve some amount of LDS storage for use by the pixel shader (separately from the interpolation attribute storage). This amount is in quantum of 4 DWords to retain the xyzw granularity for the attribute data. The PS storage is in front of (lower address) the attribute data because the number of primitives (which affects the amount of storage required for the attribute data) is variable per pixel shader.

#### 5.2.2 <u>GPR Data</u>

#### <u>IJ data</u>

- Can write up to 4 GPRs with IJ data.
- Up to 6 sets of IJ for linear/perspective \* center/centroid/sample, each IJ takes 2 GPR channels.
- I/W, J/W, 1/W values used for 'true' pull model interpolation, takes up 3 GPR channels.\
- The data will be packed by IJ pairs for the number of IJ pairs enabled in the following priority order
  - Perspective Sample
  - Perspective Center
  - Perspective Centroid
  - Linear Sample
  - Linear Center
  - Linear Centroid

Followed by an aligned vector with I/W, J/W, 1/W in the x,y,z channels of the subsequent vector.

#### **Front Face**, loaded at SPI\_PS\_IN\_CONTROL\_1.FRONT\_FACE\_ADDR.

- X =front face
- Y = prim type
- Z = pixel coverage
- $W = gen_index$

#### **Pixel Coverage**

- SPI stores 32 bit pixel coverage per quad, needs to load 8 bits per pixel to the GPR.
- 8 bit pixel coverage mask loaded into Z channel of Front Face vector.
- Always loaded if Front Face vector is present

#### <u>Gen Indx Pix</u>

- Loaded into the W channel of Front Face vector.
- Enabled by SPI\_PS\_IN\_CONTROL\_1.GEN\_INDEX\_PIX
- Always loaded if Front Face vector is present

#### **Floating Point Position**

- floating point position (pick center/centroid/sample) X,Y,Z (if PROVIDE\_Z), W
- GPR dest address specified by SPI\_PS\_IN\_CONTROL\_0.POSITION\_ADDR

#### **Fixed Point Position + Misc**

- fixed point position (no frac bits). XY = pix XY, Z = rendertarget array index, W = iterated sample num
- GPR dest address is specified by SPI\_PS\_IN\_CONTROL\_1.FIXED\_PT\_POSITION\_ADDR.

#### Line Stipple Texture Coord

- SPI calcs 32 bit line stipple texture coordinate, stores it in the position buffer.
- X = 32b tex coord
- Y = prim type (always loaded)

- GPR dest addr is specified by SPI\_PS\_IN\_CONTROL\_2.LINE\_STIPPLE\_TEX\_ADDR.

| The SPI will allocate (MAX(#IJregs-1, FF, POS, FXD, STIP) + 1) GPRs, |
|----------------------------------------------------------------------|
| where                                                                |
| FF = FRONT_FACE_ADDR & FRONT_FACE_ENA,                               |
| POS = POSITION_ADDR & POSITION_ENA,                                  |
| FXD = FIXED_PT_POSITION_ADDR & FIXED_PT_POSITION_ENA,                |
| STIP = LINE_STIPPLE_TEX_ADDR & LINE_STIPPLE_TEX_ENA).                |
|                                                                      |
| #IJregs = (((SPI_BARYC_CNTL.PERSP_CENTER_ENA +                       |
| SPI_BARYC_CNTL.PERSP_CENTROID_ENA +                                  |
| SPI_BARYC_CNTL.PERSP_SAMPLE_ENA +                                    |
| SPI_BARYC_CNTL.LINEAR_CENTER_ENA +                                   |
| SPI_BARYC_CNTL. LINEAR_CENTROID_ENA +                                |
| SPI_BARYC_CNTL. LINEAR_SAMPLE_ENA) (add up # IJ pairs)               |
| (determine  #GPRs for)                                               |
| those pairs)                                                         |
| + SPI_BARYC_CNTL.PERSP_PULL_MODEL_ENA (add 1 for I/W,J/W,1/W)        |

#### Example 1:

All IJ enables

PERSP\_PULL\_MODEL\_ENA (I/W,J/W,1/W)

FRONT\_FACE\_ENA = 1, FRONT\_FACE\_ADDR = 4 (PrimType, Coverage Mask, GenIndx are free with presence of FF vector)

POSITION\_ENA = 1, POSITION\_ADDR = 5, SPI\_INPUT\_Z = 0 (therefore Z is not valid) LINE\_STIPPLE\_TEX\_ENA = 1, LINE\_STIPPLE\_TEX\_ADDR = 6 (PrimType is free in STIPPLE vector)

| GPR /   |                  |                  |                   |                   |
|---------|------------------|------------------|-------------------|-------------------|
| CHANNEL | Х                | Y                | Z                 | W                 |
| 0       | I_persp_sample   | J_persp_sample   | I_persp_center    | J_persp_center    |
| 1       | I_persp_centroid | J_persp_centroid | I_linear_sample   | J_linear_sample   |
| 2       | I_linear_center  | J_linear_center  | I_linear_centroid | J_linear_centroid |
| 3       | I/W              | J/W              | 1/W               |                   |
| 4       | FF               | PT               | MASK              | INDX              |
| 5       | POS.X            | POS.Y            |                   | POS.W             |
| 6       | STIP             | PT               |                   |                   |

#### Example 2:

IJ persp sample = 1 IJ persp centroid = 1 IJ linear center = 1 PERSP\_PULL\_MODEL\_ENA (I/W,J/W,1/W) FRONT\_FACE\_ENA = 1, FRONT\_FACE\_ADDR = 3 (PrimType, Coverage Mask, GenIndx are free with presence of FF vector)

| FIXED_PT_POSITION | $\_ENA = 1, FIXED$ | $PT_POSITION_ADDR = 4$ |
|-------------------|--------------------|------------------------|
|                   |                    |                        |

| GPR /   |   |                 |                 |                  |                  |
|---------|---|-----------------|-----------------|------------------|------------------|
| CHANNEL |   | Х               | Y               | Z                | W                |
|         | 0 | I_persp_sample  | J_persp_sample  | I_persp_centroid | J_persp_centroid |
|         | 1 | I_linear_center | J_linear_center |                  |                  |
|         | 2 | I/W             | J/W             | 1/W              |                  |

© 2011 Advanced Micro Devices, Inc. Proprietary

| 3 | FF       | РТ       | MASK     | INDX        |
|---|----------|----------|----------|-------------|
|   |          |          | RT ARRAY |             |
| 4 | FXDPOS.X | FXDPOS.Y | INDX     | ITER SAMPLE |

# 5.3 Mapping from R7xx to Evergreen/Cayman

This is a summary of the changes that will need to happen in order to remap fixed function interpolation from R7xx to the unified interpolation methods of Evergreen. Some of the renderstate controls remain the same as R7xx, a few change meaning slightly, and some are removed and replaced by shader instructions. Since all interpolation (including non-pull-model) now happens in the PS, there must be changes to the driver/compiler as well as to validation test suites and tools to run even the simplest of cases on Evergreen/Cayman.

#### 5.3.1 <u>Renderstate Changes</u>

This section will list all of the R7xx interpolation-related renderstate and denote which ones remain, which ones change and which ones are removed for Evergreen.

#### **Existing Registers**

The SPI\_VS\_OUT\_ID\_0-9 registers are unchanged from R7xx, they are still used to define the semantic "name" of each of the VS outputs for matching in the PS input list.

The SPI\_PS\_INPUT\_CNTL\_\*(0-31) have some fields that are no longer applicable:

SEMANTIC ID – same as R7xx. The destination in the LDS for Evergreen for the attribute data will correspond to the relative input value (same as GPR locations from R7xx)

DEFAULT\_VAL - same as R7xx (affects the gradient values written into the LDS)

FLAT\_SHADE - same as R7xx. (See Global Flat Shade Enable below).

SEL\_CENTROID - Removed as this is handled by shader instruction by picking appropriate I,J values

SEL\_LINEAR - Removed as this is handled by shader instruction by picking appropriate I,J values

CYL\_WRAP - same as R7xx (affects the gradient values written into the LDS)

PT\_SPRITE\_TEX - as R7xx (affects the gradient values written into the LDS)

SEL\_SAMPLE – Removed as this is handled by shader instruction by picking appropriate I,J values. The SPI\_VS\_OUT\_CONFIG registers are unchanged from R7xx, they are still used to define the form, number, and fog specifics of the VS outputs.

The SPI\_PS\_IN\_CONTROL\_0 registers are affected as follows:

NUM\_INTERP is now the number of interpolants to process into the LDS (instead of the GPRs).

POSITION\_ENA/CENTROID/ADDR/SAMPLE are the same as R7xx as this data is placed into the GPRs still (not LDS) because it does not come from the VS, but is generated by the Scan Converter.

PARAM\_GEN/ADDR are different in that now

- a) the data is placed into the LDS (instead of GPR) and
- b) there is only 1 set of data as the persp/linear and center/centroid/sample is controlled by the appropriate i/j selection in the shader.

BARYC\_SAMPLE\_CNTL/AT\_SAMPLE\_ENA are removed from here and redefined in

SPI\_BARYC\_CNTL (see below).

PERSP/LINEAR\_GRADIENT\_ENA are the same as R7xx.

The SPI\_PS\_IN\_CONTROL\_1 registers are affected as follows:

GEN\_INDEX\_PIX is the same as before, (\_ADDR) is removed as the value is placed in FRONT\_FACE\_ADDR.W

FRONT\_FACE\_ENA is slightly different in that it used to "override" an interpolant in the GPRs, but now the interpolants go into the LDS and the FF still goes to GPRs, so it no longer overrides anything, it simply is placed in the GPRs @ FRONT\_FACE\_ADDR.X.

FRONT\_FACE\_ALL\_BITS and FRONT\_FACE\_ADDR are unchanged.

FOG\_ADDR now refers to LDS location instead of GPR.

FIXED\_PT\_POSITION\_\*/POSITION\_ULC are unchanged.

The SPI\_INTERP\_CONTROL\_0 registers are unchanged from R7xx. The affects of these registers are applied to the attribute gradient placed in the LDS. See below for comments on Flat Shading specifics. The SPI\_INPUT\_Z register is unchanged from R7xx.

The SPI\_FOG\_CNTL registers are generally removed as there are no longer any fog computations performed in the SPI. The PASS\_FOG\_THROUGH\_PS field remains with a new meaning (something like "INPUT\_VS\_FOG") but retains the same name. Fundamentally, the PASS\_FOG\_THROUGH\_PS bit will place the VS output fog (location specified in SPI\_VS\_OUT\_CONFIG) into the LDS at location SPI\_PS\_IN\_CONTROL\_1.FOG\_ADDR.

The SPI\_FOG\_FUNC\_SCALE and SPI\_FOG\_FUNC\_BIAS are removed for Evergreen as no fog calculations are performed in the SPI for Evergreen.

#### NEW REGISTERS For Evergreen/Cayman

The SPI\_BARYC\_CNTL register is new to Evergreen (although it replaces the old SPI\_PS\_IN\_CONTROL\_0.BARYC\_SAMPLE\_CNTL and BARYC\_AT\_SAMPLE\_ENA). This register is used to define which IJ pairs are provided in the GPRs.

Note that the CENTER and CENTROID "enables" are actually more than one bit. The "2" value setting allows the driver to select "center" to be in the "centroid" location and vice versa. This ability is provided for special AA cases (and potentially other uses) to allow the driver to make centroids be at center or centers be at centroid without having to change the underlying shader code (in other words, "center" can be put in the "centroid" ij location and vice versa).

The SPI\_PS\_IN\_CONTROL\_2 register is new to Evergreen. This register is used to enable the provision of a line-stipple value into the GPRs (along with the GPR address).

# 6. DB Programming

# 6.1 Compressed Depth/Stencil Textures (DSTs)

When a DST is being bound to the texture unit, a decompress must happen if the surface is compressed. This can be done either by copying the DST to a color buffer, using the DB to copy and decompress the buffer, or it can be done by having the DB do an in-place decompress. On evergreen, all uncompressed Z formats can be read by the texture path, so the DB to CB copy only needs to be done when converting to a linear tiling mode which the DB doesn't support.

#### 6.1.1 <u>In-place DB decompress</u>

There are two methods for the DB to do a decompress differing in performance depending on the circumstances. In both, the htile buffer and depth buffers should remain attached the same as when drawing.

- 1) Rasterize all tiles and decompress while rasterizing.
  - a) Z\_ENABLE=0
  - b) STENCIL\_ENABLE=0
  - c) DEPTH\_COMPRESS\_DISABLE=1
- (only if depth is needed in the texture)
- d) STENCIL\_COMPRESS\_DISABLE=1 (only if stencil is needed)
- e) DB\_RENDER\_OVERRIDE.NOOP\_CULL\_DISABLE=1
- f) DB\_RENDER\_OVERRIDE.DISABLE\_PIXEL\_RATE\_TILES=1
- g) CB\_COLOR\_CONTROL.MODE=CB\_DISABLE
- h) Draw full screen rectangle
- 2) Rasterize only the tiles that are not already decompressed, and decompress on flush.
  - e) DB\_RENDER\_OVERRIDE.NOOP\_CULL\_DISABLE=0

#### 6.1.2 <u>DB Copy + Decompress</u>

- 1) Rasterize all tiles and decompress while rasterizing.
  - a) Set the DB\_{Z,STENCIL}\_READ\_BASE registers to the source and the DB\_{Z\_STENCIL}\_WRITE\_BASE registers to the destination.
  - b) Z\_ENABLE=0
  - c) STENCIL\_ENABLE=0
  - d) [DEPTH|STENCIL]\_COMPRESS\_DISABLE=1 (For either or both)
  - e) DB\_RENDER\_OVERRIDE.NOOP\_CULL\_DISABLE=1 (same as in-place decomp)
  - f) DB\_RENDER\_OVERRIDE.DISABLE\_PIXEL\_RATE\_TILES=1
  - g) CB\_COLOR\_CONTROL.MODE=CB\_DISABLE
  - h) FORCE\_[Z|STENCIL]\_VALID=1 (makes it read tiles that are already decompressed)
  - i) FORCE\_[Z|STENCIL]\_DIRTY=1 (makes it write all tiles even if already decompressed)
  - j) PRESERVE\_COMPRESSION=1 (preserves the htile buffer for later use with the compressed buffer)
  - k) Draw full screen rectangle

#### 6.1.3 <u>Copy Depth/Stencil to a Color Buffer</u>

If the CB path is chosen, it can be done in-place or to a separate buffer. In place is good if the depth buffer is not used again unless cleared first, while the separate buffer is better if not or if it would overflow to memory.

- a) Z\_ENABLE=0
- b) STENCIL\_ENABLE=0
- c) DEPTH\_COMPRESS\_DISABLE=unchanged
- d) STENCIL\_COMPRESS\_DISABLE=unchanged
- e) DB\_RENDER\_CONTROL.DEPTH\_COPY=1 (only if needed or if in-place and needs to be rendered to again)
- f) DB\_RENDER\_CONTROL.STENCIL\_COPY=1 (same)
- g) DB\_COPY\_CENTROID=1

- h) DB\_COPY\_SAMPLE=0
- i) CB\_TARGET\_MASK=1
- j) Attach MRT0 to be the same or separate buffer with a format of COLOR\_8\_24, COLOR\_24\_8, COLOR\_16, COLOR\_32\_FLOAT, or COLOR\_X24\_8\_32\_FLOAT
- k) No blending, fog, etc.
- 1) CB0\_COLOR\_INFO.SOURCE\_FORMAT=EXPORT\_4C\_32BPC
- m) Draw full screen rectangle

#### 6.1.4 Using a DST again after texturing

If rendering is continued on a DST that was attached to the texture pipe, it must be set up to be used by the DB again. If it was not decompressed in place, then nothing needs to be done. If it was decompressed in place via the DB as described above, then recompressing is not possible for depth, and will happen as stencil is reused anyway, so nothing needs to be done. If it is still in a color tiling format, it must be pulled in through a texture and exported to the DB.

- a) Attach the DST to a texture.
- b) Create a shader that loads a DST sample and exports Z into oDepth.r and stencil in the 8 LSBs into oDepth.g
- c) Shader Compiler should then say to set
  - a. SQ\_PGM\_EXPORTS\_PS.EXPORT\_MODE = 1 (only depth export and no color exports)
  - b. DB\_SHADER\_CONTROL.Z\_EXPORT\_ENABLE=1
  - c. DB\_SHADER\_CONTROL.STENCIL\_REF\_EXPORT\_ENABLE=1
- d) Z\_ENABLE=1
- e) Z\_FUNC=ALWAYS
- f) Z\_WRITE\_ENABLE=1
- g) BACKFACE\_ENABLE=0 (or draw a front facing rect)
- h) STENCIL\_ENABLE=1
- i) STENCIL\_FUNC=REF\_ALWAYS
- j) STENCIL\_WRITE\_MASK=0xFF
- k) STENCILZPASS= STENCIL\_REPLACE
- 1) CB\_COLOR\_CONTROL.MODE=CB\_DISABLE
- m) Draw full screen rect

# 6.2 Turning off the Shader for Depth Only Rendering

The DB needs some state validation to accelerate Z/Stencil only rendering or null pixel shaders for when no color buffers are being written. R6xx automatically did the state validation to figure out of any Color was going to be written and could shut off the pixel shader if all of color, depth and stencil did not need shader output. R7xx-evergreen needs a hint to know when no Color Buffers can be written. The DB continued to factor in if the pixel shader's output affects the depth surfaces. Cayman adds back in state validation similar to R6xx which can be disabled via DB\_RENDER\_OVERRIDE2.DISABLE\_COLOR\_ON\_VALIDATION to return to R7xx-evergreen behavior or for explicit control via CB\_COLOR\_CONTROL.MODE.

# 7. Hierarchical Z (HiZ) and Hierarchical Stencil (HiS)

#### 7.1 Relevant Registers

DB\_SHADER\_CONTROL DB\_DEPTH\_INFO DB\_HTILE\_DATA\_BASE DB\_HTILE\_SURFACE DB\_PREFETCH\_LIMIT DB\_PRELOAD\_CONTROL DB\_RENDER\_CONTROL DB\_RENDER\_OVERRIDE DB\_SRESULTS\_COMPARE\_STATE0 DB\_SRESULTS\_COMPARE\_STATE1 DB\_DEPTH\_CLEAR DB\_STENCIL\_CLEAR

# 7.2 Driver Hints to the Hardware

DB\_SHADER\_CONTROL.Z\_ORDER is a hint to the hw as to which Z order to use. For most cases, it should be programmed to EARLY\_Z\_THEN\_LATE\_Z. It should be determined from the size of the fragment shader. Very short shaders may benefit from LATE\_Z, while very long shaders may benefit from RE\_Z. EARLY\_Z\_THEN\_LATE\_Z and EARLY\_Z\_THEN\_RE\_Z will attempt to use EARLY\_Z, but if the hw is not able to use EARLY\_Z due to the current state, it will use LATE\_Z or RE\_Z. Note that setting DB\_RENDER\_OVERRIDE.FORCE\_HIZ\_ENABLE/FORCE\_HIS\_ENABLE0/1 to FORCE\_OFF disables the overrides and allows HiZ/HiS to be determined by DB\_SHADER\_CONTROL.

# 7.3 HTILE buffers

The HTile buffer is a separate surface that holds the meta-data for compression and hierarchical optimizations. An HTile is a 32-bit word that represents the compression and hierarchical information for an 8x8, 4x8, 8x4 or 4x4 region of the screen as specified by HTILE\_WIDTH and HTILE\_HEIGHT. Each DB has an 8k htile cache (8k htiles, not bytes). 8x8 mode with FULL\_CACHE=1 and 4 DBs provides 2 million pixels (8\*8 pixels \* 4 DBs \* 8192 htiles). By contrast 4x4 mode with FULL\_CACHE=0 and 4 DBs provides 262144 pixels (4\*4 pixels \* 4 DBs \* 8192 htiles \* 1/2 cache). On evergreen/cayman only 8x8 mode is supported.

The following algo is used to determine the htile settings: tile pipes per DB = number of tile pipes / number of DBs; max pixels per DB = (DB size in pixels / number of tile pipes) \* (tile pipes per DB); width per DB = (DB width in pixels / number of tile pipes) \* (tile pipes per DB);

| UAA/ / AA.              |                             |                |            |        |         |                   |                |
|-------------------------|-----------------------------|----------------|------------|--------|---------|-------------------|----------------|
| Max<br>pixels<br>per DB | Width<br>per DB<br>(pixels) | HTILE<br>W x H | FULL_CACHE | LINEAR | PRELOAD | PREFETCH<br>W x H | PRELOAD_WINDOW |
| <=64k                   |                             | 4x4            | 0          | 1      | 1       | 0x0               | 0              |
| <=128k                  |                             | 4x4            | 1          | 1      | 1       | 0x0               | 0              |
| <=256k                  |                             | 8x4            | 1          | 1      | 1       | 0x0               | 0              |
| <=512k                  |                             | 8x8            | 1          | 1      | 1       | 0x0               | 0              |
| >512k                   | <=512                       | 8x8            | 1          | 0      | 1       | 16x4              | 1              |
| >512k                   | <=1024                      | 8x8            | 1          | 0      | 1       | 16x2              | 1              |
| >512k                   | >1024                       | 8x8            | 1          | 0      | 1       | 16x0              | 1              |

6xx/7xx:

#### **Evergreen/Cayman:**

| 0                       |                             |                |            |        |         |                   |                |
|-------------------------|-----------------------------|----------------|------------|--------|---------|-------------------|----------------|
| Max<br>pixels<br>per DB | Width<br>per DB<br>(pixels) | HTILE<br>W x H | FULL_CACHE | LINEAR | PRELOAD | PREFETCH<br>W x H | PRELOAD_WINDOW |
| <=256k                  |                             | 8x8            | 1          | 1      | 1       | 0x0               | 0              |
| <=512k                  |                             | 8x8            | 1          | 1      | 1       | 0x0               | 0              |
| >512k                   | <=512                       | 8x8            | 1          | 0      | 1       | 16x4              | 1              |
| >512k                   | <=1024                      | 8x8            | 1          | 0      | 1       | 16x2              | 1              |
| >512k                   | >1024                       | 8x8            | 1          | 0      | 1       | 16x0              | 1              |

# 7.4 HiZ

HiZ requires an htile buffer and DB\_DEPTH\_INFO.TILE\_SURFACE\_ENABLE=1 (DB\_Z\_INFO.TILE\_SURFACE\_ENABLE=1 on evergreen+). Unless overridden in DB\_RENDER\_OVERRIDE, HiZ will be used by the hardware whenever possible.

# 7.5 HiS

HiS requires an htile buffer and DB\_DEPTH\_INFO.TILE\_SURFACE\_ENABLE=1

(DB\_Z\_INFO.TILE\_SURFACE\_ENABLE=1 on evergreen/cayman). Unless overridden in

DB\_RENDER\_OVERRIDE, the hardware will perform HiS testing based on the enabled sets of HiS state as determined by DB\_SRESULTS\_COMPARE\_STATE0 and DB\_SRESULTS\_COMPARE\_STATE1. There are two sets so the driver can utilize one while updating the other as the stencil state changes. A full screen blit must be done before changing DB\_SRESULTS\_COMPARE\_STATE\* once the stencil buffer is bound.

# 8. CB Programming

The CB expects the driver to validate state and will expect the driver to catch certain invalid configurations. In many cases, if invalid state is programmed the CB will not hang, but the results are otherwise undefined.

## 8.1 Normal operation

#### 8.1.1 <u>Register state</u>

Under normal operation (CB\_COLOR\_CONTROL.MODE = CB\_NORMAL, not in dual-source blending or multiwrite mode) the following conditions apply:

- CB\_SHADER\_MASK must be programmed consistently with the actual shader outputs. If N exports are enabled from the shader, then N fields in CB\_SHADER\_MASK should have at least one bit set.
- The following additional registers must be programmed:
  - CB\_COLOR\_CONTROL
  - CB\_SHADER\_MASK
  - CB\_TARGET\_MASK.
- If the blend constant is used by any MRT, CB\_BLEND\_{RED,GREEN,BLUE,ALPHA} must also be programmed. The blend constant is in use if, for any MRT,
  - 1. The blender for the MRT is enabled and the surface is blendable,
  - 2. The blend equation specifies CONSTANT for any blend factor.
- If an MRT is not disabled, either with CB\_SHADER\_MASK or by setting CB\_COLOR\*\_INFO.FORMAT = COLOR\_INVALID, then the following registers must be programmed for the MRT:
  - CB\_COLOR\*\_INFO
  - CB\_COLOR\*\_ATTRIB
  - CB\_BLEND\*\_CONTROL
  - CB\_COLOR\*\_BASE
  - CB\_COLOR\*\_SIZE
  - CB\_COLOR\*\_VIEW
  - CB\_COLOR\*\_CMASK (if compression is enabled)
  - CB\_COLOR\*\_FMASK (if compression is enabled)
  - CB\_COLOR\*\_CMASK\_SLICE (if compression is enabled).
  - CB\_COLOR\*\_FMASK\_SLICE (if compression is enabled).

The following registers must be also be configured:

- GB\_ADDR\_CONFIG
- PA\_SC\_AA\_CONFIG (evergreen)
- PA\_SC\_MULTI\_CHIP\_CNTL
- CP\_VMID (cayman)
- CP\_RINGID (cayman)

#### 8.1.2 <u>Prohibited combinations</u>

The following combinations of configuration are prohibited in CB:

- Special rop3 modes cannot be used when *any* MRT is using the blender. If any MRT is using the blender, ROP3 must be set to the value 0xCC.
- AA surfaces cannot use 1D tiling modes (resolve target can though).
- MRTs that have COLOR<mrt>\_INFO.ARRAY\_MODE == ARRAY\_LINEAR\_GENERAL must use the COLOR<mrt>\_INFO.ENDIAN value ENDIAN\_NONE. (cayman)

#### 8.1.3 <u>Uncompressed MSAA surfaces</u>

MSAA surfaces may be either compressed or uncompressed. Typical usage is to create a compressed MSAA surface, however some applications may wish to render directly to an uncompressed color surface. To do so, set CB\_COLOR\*\_INFO.COMPRESSION = 0. In this mode, only a color surface needs to be allocated; cmask and fmask surfaces are completely ignored. Uncompressed MSAA surfaces can be rendered to normally, and can even be resolved with CB\_RESOLVE; however, an uncompressed surface cannot be decompressed with CB\_DECOMPRESS.

#### 8.2 Dual-source mode

Dual-source mode is enabled if SRC1 appears in the blend equation for MRT0 (MRT0 must enable its blender and be a blend-capable surface, the blend equation must specify addition or subtraction as the operator, and either a color or alpha blend factor must refer to SRC1). When in dual-source mode, the following restrictions apply:

- 1. Define MRT0 settings.
- 2. CB\_SHADER\_MASK.OUTPUT0\_ENABLE should be programmed based on the components that are exported for src0. CB\_SHADER\_MASK.OUTPUT1\_ENABLE is not checked and should be 0. All other fields in CB\_SHADER\_MASK should be zero, or bound to a RAT. Note that MRT1 is disabled in this mode (including RATs).
- 3. COLOR1\_INFO.SOURCE\_FORMAT must be programmed to match COLOR0\_INFO.SOURCE\_FORMAT so that the 2 quads come in the same format. (New in Evergreen).
- 4. Enable blending for MRT0; set CB\_BLEND<mrt>\_CONTROL.ENABLE = 1.
- 5. Configure a blend-capable format for MRT0.
- 6. Configure CB\_BLEND0\_CONTROL to an equation where SRC1 appears at least once in either the color or alpha blend equation

#### 8.3 Fast Clear

- Fast clear requires a cmask surface.
- Fast clear does not require a fmask surface, but the fmask surface must point to the color surface when a fmask surface does not exist. In particular, make sure to set CB\_COLOR\*\_FMASK.BASE\_256B = CB\_COLOR\*\_BASE.BASE\_256B and CB\_COLOR\*\_MASK.FMASK\_TILE\_MAX = CB\_COLOR\*\_SLICE.TILE\_MAX, and CB\_COLOR\*\_ATTRIB.FMASK\_BANK\_HEIGHT = CB\_COLOR\*\_ATTRIB.BANK\_HEIGHT.
- Fast clear is supported with point-sampled or compressed multisample surfaces. Fast clear with an uncompressed multisample surface is not supported.
- Fast clear is not supported for linear surfaces (array mode == LINEAR\_ALIGNED or LINEAR\_GENERAL).

• An eliminate fast clear operation must be done on the surface before another block can read it if some of the pixels in the surface have not been covered by drawing (see below).

To fast clear a surface, do the following:

- 1. Flush the cmask cache.
- 2. Clear the cmask surface to all zeroes. The color cache may be used to do this.
- 3. Wait for idle.

To use a fast cleared surface, do the following:

- 1. Initialized the cmask and fmask surface registers.
- 2. Set CB\_COLOR<mrt>\_INFO.FAST\_CLEAR = 1.
- 3. Set CB\_COLOR<mrt\_CLEAR\_WORD\* to the clear color.
- 4. If CB\_COLOR<mrt>\_INFO.FORMAT == X24\_8\_32, the X24 component must be masked in the clear\_word registers. This means CB\_COLOR<mrt>\_CLEAR\_WORD1 &= 0x000000FF. (evergreen)

## 8.4 Eliminate Fast Clear/Decompress/ Fmask Decompress

#### 8.4.1 <u>Eliminate Fast Clear</u>

This eliminates the fast cleared tiles from the color surface and fills them in with the clear color. This allows the surface to be read by other clients like textures.

#### 8.4.2 <u>Decompress</u>

This decompresses the multisample surface so that it may be read without the Cmask or Fmask surfaces. The Cmask and Fmask surfaces will be updated to reflect a decompressed multisample surface, so it is possible to continue rendering with compression enabled after a decompress operation. Fast cleared tiles will be eliminated automatically, so an eliminate fast clear pass before this is unnecessary.

It is illegal to decompress a surface that does not have compression enabled. Decompress must be done with 8x8 pixel tile granularities.

#### 8.4.3 Fmask Decompress

This decompresses the fmask surface so that it may be read by other clients like textures. Fast cleared tiles will be eliminated automatically, so an additional eliminate fast clear pass before the shader reads the compressed AA surface is unnecessary.

Rendering with AA compression can be done even after the fmask has been decompressed.

#### 8.4.4 <u>Common programming procedure</u>

- 1. Flush the color cache if this is decompress. Flush the fmask cache if this is fmask decompress.
- 2. Fully enable MRT0 in CB\_SHADER\_MASK (OUTPUT0\_ENABLE is 0xF). All other MRTs must be disabled.
- 3. Set MRT0 to be the multi-sampled surface.
- 4. Disable blending.
- 5. Disable ROP3.
- 6. Set CB\_TARGET\_MASK.TARGET0\_ENABLE = 0xF.
- 7. Set CB\_COLOR\_CONTROL.MODE = {CB\_DECOMPRESS, CB\_ELIMINATE\_FAST\_CLEAR, CB\_FMASK\_DECOMPRESS}.
- 8. Draw over the regions that should be modified. Usually, this will just be a large rectangle the size of the surface. Do not draw the same pixel twice.
- 9. Flush the color cache for all the modes except when this is a fmask decompress without fast clear. Also, flush the fmask cache if this is fmask decompress.

Note that the rtindex clamping feature is not allowed.

## 8.5 Resolve

A single multi-sampled surface may be resolved into a point-sampled surface. The point-sampled surface must not be fast cleared. The multi-sampled surface is bound as MRT0, and the point-sampled surface is bound as MRT1. MRT1 must have the same format, number type, component swap and endianness as MRT0; the format must be a blend-capable format. The surfaces may have different tiling modes, but neither surface can use ARRAY\_LINEAR\_GENERAL or ARRAY\_LINEAR\_ALIGNED tiling. Fast cleared tiles will be eliminated automatically, so an eliminate fast clear pass before this is unnecessary. To resolve the multi-sampled surface, do the following:

- 1. Flush and invalidate color cache.
- 2. Fully enable MRT0 in CB\_SHADER\_MASK (OUTPUT0\_ENABLE is 0xF). All other MRTs must be disabled.
- 3. Set MRT0 to be the multi-sampled surface. MRT0 must be in a blend-capable format.
- 4. Set the base address of MRT1 to the resolve destination, but do not enable it in CB\_SHADER\_MASK. Configure other address-related registers for MRT1, including the tiling in CB\_COLOR1\_INFO. The following settings for MRT1 should agree with MRT0:
  - CB\_COLOR1\_INFO.ENDIAN
  - CB\_COLOR1\_INFO.FORMAT
  - CB\_COLOR1\_INFO.NUMBER\_TYPE
  - CB\_COLOR1\_INFO.COMP\_SWAP
  - Any field that is deterministic based on the surface format.
- 5. Disable Blending.
- 6. Disable ROP3.
- 7. Set CB\_TARGET\_MASK.TARGET0\_ENABLE = 0xF.
- 8. Set CB\_COLOR\_CONTROL.MODE = CB\_RESOLVE.
- 9. Draw over the regions that should be resolved. Usually, this will just be a large rectangle the size of the buffer. Do not draw the same pixel twice.
- 10. Flush and invalidate color cache.

Note that the rtindex clamping feature is not allowed in resolve mode.

#### 8.6 Compute Shader

Compute shaders can perform atomic writes ("device reduction operations") to memory via the CB. The order of execution of the operations is not guaranteed, only that they are atomic. These writes can include simple operations (min, max, add, and, or, exchange, compare-exchange) and can optionally return a value (pre-op) back to the shader.

The CF\_export adds two new opcodes for RAT exports: EXPORT\_RAT and EXPORT\_RAT\_CACHELESS.

If CB\_COLOR<mrt>\_INFO.RAT is programmed, the surface is treated as a Random Access Target and can only be drawn by Compute Shader operations. A set of MRTs can be configured for RATs and normal rendering. The only stipulation is that all RAT MRTs must be assigned to higher number MRTs than normal rendering MRTs.

These are the major additional rules of the state of a RAT:

- 1. CB\_COLOR<mrt>\_ATTRIB must be programmed in accordance with the surface tile format.
- 2. CB\_COLOR<mrt>\_INFO.SOURCE\_FORMAT must be 4c\_32bpc (value of 0).
- 3. If CB\_COLOR<mrt>\_INFO.RESOURCE\_TYPE is 1D or BUFFER, (or STRUCTUREDBUFFER on cayman)
  - a. CB\_COLOR<mrt>\_ATTRIB.NON\_DISP\_TILING\_ORDER must be 1

- b. CB\_COLOR<mrt>\_INFO.ARRAY\_MODE must be ARRAY\_LINEAR\_ALIGNED
- 4. CB\_COLOR<mrt>.\_INFO.FAST\_CLEAR is not permitted for RATs.
- 5. Set CB\_IMMED<mrt>\_BASE
- 6. Normally, CB\_COLOR<mrt>\_INFO.FORMAT must be COLOR\_32. Also, the NUMBER\_TYPE must be NUMBER\_UINT. For all other formats and number types, the only ops supported by Compute Shader quads are STORE and NOP.
- 7. For immediate ops, the offset specified in the green channel of each scatter export in the quad must be unique.

The driver allocates a region of video memory where atomic operations return data. This acts as a mailbox. The driver programs CB\_IMMED<mrt>\_BASE with the base address of the return-value memory. The shader export instructions then include the return address offset (per pixel) as part of the address export. The CB performs the atomic operation and also writes back the pre-op value to the return address specified. The shader must use write-with-acknowledge with these operations to know when the return data has been written to the return buffer. The driver should set up a vertex buffer constant to point to this return-value memory for reads.

CB\_IMMED<mrt>\_BASE must be programmed uniquely for each shader engine on mulit-SE asics.

# 9. PM4

### 9.1 Introduction

When programming in the PM4 mode, the driver does not write directly to the GPU registers to carry out drawing operations on the screen. Instead, it prepares data in the format of PM4 Command Packets in either system or video (a.k.a. local) memory, and lets the Micro Engine to do the rest of the job.

Three types of PM4 command packets are currently defined. They are types 0, 2 and 3 as shown in the following figure. A PM4 command packet consists of a packet header, identified by field HEADER, and an information body, identified by IT\_BODY, that follows the header. The packet header defines the operations to be carried out by the PM4 micro-engine, and the information body contains the data to be used by the engine in carrying out the operation. In the following, we use brackets [.] to denote a 32-bit field (referred to as DWord) in a packet, and braces {.} to denote a size-varying field that may consist of a number of DWords. If a DWord consists of more than one field, the fields are separated by "". The field that appears on the far left takes the most significant bits, and the field that appears on the far right takes the least significant bits. For example, DWord LO\_WORD denotes that HI\_WORD is defined on bits 16-31, and LO\_WORD on bits 0-15. A C-style notation of referencing an element of a structure is used to refer to a sub-field of a main field. For example, MAIN\_FIELD.SUBFIELD refers to the sub-field SUBFIELD of MAIN\_FIELD.

#### 9.1.1 <u>Type-0 Packet</u>

Write N DWords in the information body to the N consecutive registers, or to the register, pointed to by the BASE\_INDEX field of the packet header. This packet supports a register memory map up to 64K DWords (256K Bytes).

#### **Type-0 Packet Description**

The use of this packet requires the complete understanding of the registers to be written. The register address is split into two areas: the first 32K bytes is system registers and beyond that is graphics and multi-media. For graphics and multi-media registers there is an alternative, called SET\_\*. For the first 32KB of register space (system registers) there is no SET\_\* type packet and TYPE-0 packets should be used.

#### 9.1.2 <u>Type-1 Packet (not-supported)</u>

Type-1 packets are not supported by any R6xx+ families.

#### 9.1.3 <u>Type-2 Packet</u>

This is a filler packet. It has only the header, and its content is not important except for bits 30 and 31. It is used to fill up the trailing space left when the allocated buffer for a packet, or packets, is not fully filled. This allows the CP to skip the trailing space and to fetch the next packet.

| DW  | Field Name | Description                                                                      |  |  |  |
|-----|------------|----------------------------------------------------------------------------------|--|--|--|
| 1   | HEADER     | ader of the packet.                                                              |  |  |  |
|     |            | 29:00 Reserved; This field is undefined, and is set to zero by default.          |  |  |  |
|     |            | 31:30 TYPE; Packet identifier. It should be 2.                                   |  |  |  |
| 2-N | Body       | bit [31:0] Body; The information body "IT_BODY" will be described extensively in |  |  |  |
|     |            | the following sections                                                           |  |  |  |

#### **Type-2 Packet Description**

#### 9.1.4 <u>Type-3 Packet</u>

Carry out the operation indicated by field IT\_OPCODE.

#### **Type-3 Packet Description**

| DW  | Field Name                   | Description                                                                          |  |  |  |  |  |
|-----|------------------------------|--------------------------------------------------------------------------------------|--|--|--|--|--|
| 1   | HEADER Header of the packet. |                                                                                      |  |  |  |  |  |
|     |                              | bit [0] PREDICATE; Predicated version of packet when bit 0 is set.                   |  |  |  |  |  |
|     |                              | bit [1] SHADER_TYPE; (0: Graphics, 1: Compute Shader)                                |  |  |  |  |  |
|     |                              | bit [7:2] Reserved; This field is undefined, and is set to zero by default.          |  |  |  |  |  |
|     |                              | bit [15:8] IT_OPCODE; Operation to be carried out.                                   |  |  |  |  |  |
|     |                              | bit [29:16] COUNT; Number of DWords -1 in the information body. It is N-1 if the     |  |  |  |  |  |
|     |                              | information body contains N DWords.                                                  |  |  |  |  |  |
|     |                              | bit [31:30] TYPE; Packet identifier. It should be 3.                                 |  |  |  |  |  |
|     | Dada                         | bit [31:0] Body; The information body "IT_BODY" will be described extensively in the |  |  |  |  |  |
| 2-N | Body                         | following sections                                                                   |  |  |  |  |  |

Type-3 packets have a common format for their headers. However, the size of their information body may vary depending on the value of field IT\_OPCODE. The size of the information body is indicated by field COUNT. If the size of the information is N DWords, the value of COUNT is N-1. In the following packet definitions, we will describe the field IT\_BODY for each packet with respect to a given IT\_OPCODE, and omit the header.

# 9.2 Initialization Packets

#### 9.2.1 <u>ME\_INITIALIZE - R7xx/Evergreen/Cayman</u>

The usage rules for the ME\_INITIALIZE packet are:

- The ME\_INITIALIZE packet should be sent to the CP immediately after loading the microcode and enabling the Micro Engine (ME).
- This Type-3 packet is used by the ME to initialize internal state information that is used by other packets.
- If the ME\_INITIALIZE packet changes the MAX\_CONTEXT value, then it needs to be followed by a CONTEXT CONTROL packet with a full load mask to force a reload of shadowed registers and constants.
- If the device supports more than one ring buffer for a single GPU (i.e., Cayman), only the primary ring (3D ring) should have this packet.

| _  | 1                     |                                                                            |
|----|-----------------------|----------------------------------------------------------------------------|
| DW | Field Name            | Description                                                                |
| 1  | HEADER                | Header of the packet.                                                      |
| 2  | Default Reset Control | bit [0] - Default Reset Control; Resets specific areas of the Scratch      |
|    |                       | memory to known values and Resets the Current and Last contexts            |
| 3  | Reserved              | bit [31:0] Reserved; Program to zero.                                      |
| 4  | MAX_CONTEXT           | bit [2:0] MAX_CONTEXT; Maximum Context in Chip. Values are 1 to 7.         |
|    |                       | Max context of 0 is not valid since that context is now used for the clear |
|    |                       | state context. For example, 3 means the GPU uses contexts 0-3, i.e., it    |
|    |                       | utilizes 4 contexts.                                                       |
| 5  | DEV_ID                | bit [31:24] - Reserved                                                     |
|    | EXTERNAL_MEM_SWAP     | bit [23:16] - Device-ID                                                    |
|    |                       | bit [15:2] - Reserved                                                      |
|    |                       | bit [1:0] - Swap Code Used for the following transactions: Load_*, Set_*   |
|    |                       | , PM4 headers                                                              |
| 6  | Header_Dump_Base      | bit [31:4] - Header_Dump_Base : a 4 Kbyte aligned address, i.e. base       |
|    | Header_Dump_Swap      | memory address [39:12] of the external memory location where CP will       |
|    |                       |                                                                            |

#### ME\_INITIALIZE Packet Description

|   |                    | dump PM4 Headers.                                                   |
|---|--------------------|---------------------------------------------------------------------|
|   |                    | bit [3:2] - Reserved: should be set to zero.                        |
|   |                    | bit [1:0] - Header_Dump_Swap: the 2 bit Swap Code used when writing |
|   |                    | headers to memory.                                                  |
| 7 | Header_Dump_Enable | bit [31] - Header_Dump_Enable: Enable Writing PM4 Headers to        |
|   | Header_Dump_Size   | Memory for Debug                                                    |
|   |                    | bit [30] - Reserved.                                                |
|   |                    | bit [29:0] - Header_Dump_Size: Size in DWords for the Header Dump   |
|   |                    | Ring in External Memory.                                            |

#### 9.2.2 <u>PREAMBLE CNTL - R7xx/Evergreen/Cayman</u>

This packet has two purposes: (1) indicate the packets that belong to each preamble in a command buffer, so can record the location of the last preamble. If there is a context switch, the CP could then fetch the last completed preamble to reload the GPU state when the process was preempted. The CP will then skip to the location in the command buffer where it left off when the process was switched out. There can be multiple preambles in an IB to initialize the current state, so the CP must update its internal values whenever the packets are received. Another purpose is to (2) indicate to the CP the packets that program the Clear State, versus other Set\_\* packets that go to the next available state.

The Driver will send the PREAMBLE\_CNTL packets before and after the Clear State programming. Initializing the Clear State is only supported immediately after the ME\_INITIALIZE packet. The Begin marker must always come first and the End marker must be of the same type as the previous "Begin". The driver initializes the internal GPU context with the help of the Preamble\_Cntrl packet by sending the "clear state" between two instances of this packet, one representing the begin marker and the second the end marker. Any register is allowed to be programmed within the begin and end markers, however, only multi-state GFXDEC registers will be programmed automatically by the CLEAR\_STATE packet. Config and Constant updates must be handled by the driver. See CLEAR\_STATE packet for special provision for very fast clearing of all constants.

| DW | Field Name | Description                                                                 |
|----|------------|-----------------------------------------------------------------------------|
| 1  | HEADER     | Header of the packet.                                                       |
| 2  | CMD        | Command                                                                     |
|    |            | bit [19:0] Reserved; Reserved for internal use by the CP                    |
|    |            | bit [27:20] Reserved;                                                       |
|    |            | bit [31:28] Command;                                                        |
|    |            | - 0000 : Begin Preamble, domain records the current IB offset for later use |
|    |            | - 0001 : End, domain records the current IB offset for later use            |
|    |            | - 0010 : Begin of Clear State initialization [Evergreen+]                   |
|    |            | - 0011 : End of Clear State initialization [Evergreen+]                     |
|    |            | - 1xxx, x1xx : Reserved                                                     |

| PREAMBLE_CO | NTROL Packet | Description |
|-------------|--------------|-------------|
|-------------|--------------|-------------|

# 9.3 Command Buffer Packets

#### 9.3.1 INDIRECT\_BUFFER - R7xx/Evergreen/Cayman

This packet is used for dispatching Indirect Buffers.

#### **INDIRECT\_BUFFER** Packet Description

| DW | Field Name | Description                                                                         |
|----|------------|-------------------------------------------------------------------------------------|
| 1  | HEADER     | Header of the packet                                                                |
| 2  | IB_BASE_LO | [31:2] - Indirect Buffer Base Address[31:2] - DW-Aligned                            |
|    |            | [1:0] - Swap function used for data write.                                          |
| 3  | IB_BASE_HI | [7:0] - Upper bits of Address [39:32]                                               |
| 4  | VMID       | [31:24] - VMID[7:0], Virtual Memory Domain ID for the command buffer. This field is |
|    |            | valid starting with cayman                                                          |
|    | IB_SIZE    | [19:0] - Indirect Buffer Size [19:0], size of the Indirect Buffer in DWORDs.        |

#### 9.3.2 <u>DRAW INDEX - R7xx/Evergreen/Cayman</u>

DRAW\_INDEX draws a set of primitives using fetched indices. The SOURCE\_SELECT field in the DRAW\_INITIATOR indicates that the VGT will DMA the indices.

| DW | Field Name    | Description                                                                         |
|----|---------------|-------------------------------------------------------------------------------------|
| 1  | HEADER        | Header of the packet                                                                |
| 2  | INDEX_BASE_LO | [30:0] - Base Address [31:1] of Index Buffer (Word-Aligned). Written to the         |
|    |               | VGT_DMA_BASE register (No Context Supplied).                                        |
| 3  | INDEX_BASE_HI | [7:0] - Base Address Hi [39:32] of Index Buffer. Written to the VGT_DMA_BASE_HI     |
|    |               | register (No Context Supplied).                                                     |
| 4  | INDEX_COUNT   | [31:0] - INDEX_COUNT [31:0] - Number of indices in the Index Buffer. Written to the |
|    |               | VGT_DMA_SIZE register (No Context Supplied).                                        |
|    |               | Written to the VGT_NUM_INDICES register for the assigned context.                   |
| 5  | DRAW_INITIATO | Draw Initiator Register. Written to the VGT_DRAW_INITIATOR register for the         |
|    | R             | assigned context.                                                                   |

#### **DRAW\_INDEX** Packet Description

#### 9.3.3 <u>DRAW\_INDEX\_2 - R7xx/Evergreen/Cayman</u>

Draws a set of primitives using fetched indices from a bounded index buffer. The SOURCE\_SELECT field in the DRAW\_INITIATOR indicates that the VGT will DMA the indices.

#### DRAW\_INDEX\_2 Packet Description

| DW | Field Name    | Description                                                                  |
|----|---------------|------------------------------------------------------------------------------|
| 1  | HEADER        | Header of the packet                                                         |
| 2  | MAX_SIZE      | MAX_SIZE [31:0] - VGT DMA maximum number of indices                          |
|    |               | until out of bound index buffer is accessed. Written to the VGT_DMA_MAX_SIZE |
|    |               | register (No Context Supplied).                                              |
| 3  | INDEX_BASE_LO | [30:0] - Base Address [31:1] of Index Buffer (Word-Aligned). Written to the  |

|   |               | VGT_DMA_BASE register (No Context Supplied).                                    |
|---|---------------|---------------------------------------------------------------------------------|
| 4 | INDEX_BASE_HI | [7:0] - Base Address Hi [39:32] of Index Buffer. Written to the VGT_DMA_BASE_HI |
|   |               | register (No Context Supplied).                                                 |
| 5 | INDEX_COUNT   | INDEX_COUNT [31:0] - Number of indices in the Index Buffer. Written to the      |
|   |               | VGT_DMA_SIZE register (No Context Supplied). Written to the VGT_NUM_INDICES     |
|   |               | register for the assigned context.                                              |
| 6 | DRAW_INITIATO | Draw Initiator Register. Written to the VGT_DRAW_INITIATOR register for the     |
|   | R             | assigned context.                                                               |

#### 9.3.4 DRAW INDEX AUTO - R7xx/Evergreen/Cayman

Draws a set of primitives using indices auto-generated by the VGT. The SOURCE\_SELECT field in the DRAW\_INITIATOR indicates that the VGT need to auto-generate the indices.

| DW | Field Name  | Description                                                                                                           |
|----|-------------|-----------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER      | Header of the packet                                                                                                  |
| 2  | INDEX_COUNT | INDEX_COUNT [31:0] - Number of indices to generate. Written to the VGT_NUM_INDICES register for the assigned context. |
| 3  | _           | Primitive type and other control. Written to the VGT_DRAW_INITIATOR register for the assigned context.                |

#### DRAW\_INDEX\_AUTO Packet Description

#### 9.3.5 DRAW INDEX IMMED - R7xx/Evergreen/Cayman

Draws a set of primitives using indices in the packet. The SOURCE\_SELECT field in the DRAW\_INITIATOR indicates that the VGT will use immediate data for the indices. This packet is generally used for draw packets with a small number" of indices. It is faster for the driver to just put the indices into the command buffer instead of copying them to a separate index buffer.

| DW   | Field Name            | Description                                                                      |
|------|-----------------------|----------------------------------------------------------------------------------|
| 1    | HEADER                | Header of the packet                                                             |
|      | INDEX_COUNT           | INDEX_COUNT [31:0] - Number of indices that will be written to the               |
| 2    |                       | VGT_IMMED_DATA register. Written to the VGT_NUM_INDICES register for the         |
|      |                       | assigned context.                                                                |
| 2    | IDRAW INITIATOR       | Primitive type and other control. Written to the VGT_DRAW_INITIATOR register for |
| 5    |                       | the assigned context.                                                            |
| 4 to | [indx16 #1 indx16 #0] | Index Data. Written to the VGT_IMMED_DATA register for the assigned context. See |
| End  | or [indx32 #0]        | the INDEX_TYPE packet for details on how to specify the 16 or 32-bit indices.    |

#### DRAW\_INDEX\_IMMED Packet Description

#### 9.3.6 <u>DRAW\_INDEX\_OFFSET - R7xx/Evergreen/Cayman</u>

The purpose of this feature is to reduce the amount of addresses the driver must patch in the IB to only the first index buffer call instead of every draw that uses that index buffer. The base of the index buffer, supplied in the INDEX\_BASE packet, and the index type (16 bit or 32 bit), supplied in the INDEX\_TYPE Packet, must have already been sent when this packet arrives at the CP.

Draws a set of primitives using fetched indices with no patching required. The CP will shift the

INDEX\_OFFSET by one or two bits depending on the value in INDEX\_TYPE and then add that offset to the Base Address previously supplied in the INDEX\_BASE packet.

The functionality is implemented using one current packet, INDEX\_TYPE, and two new packets Draw/Dispatch Packets DRAW\_INDEX\_OFFSET and INDEX\_BASE. The driver sends the INDEX\_TYPE and INDEX\_BASE packets before the DRAW\_INDEX\_OFFSET packet.

| DW | Field Name     | Description                                                                                                                                                                                     |
|----|----------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER         | Header of the packet                                                                                                                                                                            |
| 2  | INDEX_OFFSET   | Starting index number in the index buffer. INDEX_OFFSET of zero represents the first index, index one is the second index.                                                                      |
| 3  | INDEX_COUNT    | INDEX_COUNT [31:0] - Number of indices in the Index Buffer. Written to the<br>VGT_DMA_SIZE register (No Context Supplied). Written to the<br>VGT_NUM_INDICES register for the assigned context. |
| 4  | DRAW_INITIATOR | Draw Initiator Register. Written to the VGT_DRAW_INITIATOR register for the assigned context.                                                                                                   |

#### DRAW\_INDEX\_OFFSET Packet Description

#### 9.3.7 DRAW INDEX OFFSET 2 - R7xx/Evergreen/Cayman

| ASICs Supported: |                                                                      |  |
|------------------|----------------------------------------------------------------------|--|
| Family           | Supported Members                                                    |  |
| 6xx              | All, except R600, which does not have the VGT_DMA_MAX_SIZE register. |  |
| 7xx              | All                                                                  |  |
| Evergreen+       | All                                                                  |  |

The purpose of this packet, in conjunction with the INDEX\_TYPE Packet and INDEX\_BASE packets, draws a set of primitives using fetched indices from a bounded index buffer while minimizing the amount of address patching that the driver must do Vista BDM. The base of the index buffer, supplied in the INDEX\_BASE packet, and the index type (16 bit or 32 bit), supplied in the INDEX\_TYPE Packet, must have already been sent when this packet arrives at the CP.

The CP will shift the INDEX\_OFFSET by one or two bits depending on the value in INDEX\_TYPE and then add that offset to the Base Address previously supplied in the INDEX\_BASE packet.

The functionality is implemented using one current packet, INDEX\_TYPE, and two new packets Draw/Dispatch Packets DRAW\_INDEX\_OFFSET and INDEX\_BASE. The driver sends the INDEX\_TYPE and INDEX\_BASE packets before the DRAW\_INDEX\_OFFSET packet.

| DW | Field Name   | Description                                                                                                                                                                                     |
|----|--------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER       | Header of the packet                                                                                                                                                                            |
| 2  | MAX_SIZE     | MAX_SIZE [31:0] - VGT DMA maximum number of indices until out of bound<br>index buffer is accessed. Written to the VGT_DMA_MAX_SIZE register (No<br>Context Supplied).                          |
| 3  | INDEX_OFFSET | Starting index number in the index buffer. INDEX_OFFSET of zero represents the first index, index one is the second index.                                                                      |
| 4  | INDEX_COUNT  | INDEX_COUNT [31:0] - Number of indices in the Index Buffer. Written to the<br>VGT_DMA_SIZE register (No Context Supplied). Written to the<br>VGT_NUM_INDICES register for the assigned context. |

DRAW\_INDEX\_OFFSET\_2 Packet Description

| 5 | DRAW_INITIATOR | Draw Initiator Register. Written to the VGT_DRAW_INITIATOR register for the |
|---|----------------|-----------------------------------------------------------------------------|
|   |                | assigned context.                                                           |

#### 9.3.8 <u>INDEX BASE - R7xx/Evergreen/Cayman</u>

The purpose of the INDEX\_BASE packet, in conjunction with the INDEX\_TYPE Packet and DRAW\_INDEX\_OFFSET packets, is to minimize the amount of address patching that the driver must do. The driver only needs to send the INDEX\_BASE (and therefore patch the indirect buffer) once when the index buffer call is made. Subsequence calls to draw with an index buffer offset (and resulting DRAW\_INDEX\_OFFSET packet) need no patching.

#### **INDEX\_BASE** Packet Description

| DW | Field Name    | Description                                                  |
|----|---------------|--------------------------------------------------------------|
| 1  | HEADER        | Header of the packet                                         |
| 2  | INDEX_BASE_LO | [30:0] - Base Address [31:1] of Index Buffer (Word-Aligned). |
| 3  | INDEX_BASE_HI | Bits [7:0] Base Address Hi [39:32] of Index Buffer.          |

#### 9.3.9 <u>INDEX\_TYPE - R7xx/Evergreen/Cayman</u>

This packet is considered part of the draw packet sequence, so the VGT\_INDEX\_TYPE is not shadowed. If this packet is not sent before each draw then it will need to be in the preamble of each command buffer to ensure it gets set correctly before the first draw.

| INDEX | _TYPE | Packet | Description |
|-------|-------|--------|-------------|
|-------|-------|--------|-------------|

| DW | Field Name | Description                                   |
|----|------------|-----------------------------------------------|
| 1  | HEADER     | Header of the packet                          |
|    |            | [0] Index Type                                |
| 2  | INDEX_TYPE | - 0 = 16-bits;                                |
| Z  | SWAP_MODE  | - 1 = 32-bits.                                |
|    |            | [3:2] Swap Mode[1:0] - Byte swapping control. |

#### 9.3.10 <u>NUM\_INSTANCES - R7xx/Evergreen/Cayman</u>

NUM\_INSTANCES is used to specify the number of instances for the subsequence draw command. This packet is considered part of the draw packet sequence. If this packet is not sent before each draw then it will need to be in the preamble of each command buffer to ensure it gets set correctly before the first draw.

#### NUM\_INSTANCES Packet Description

| DW | Field Name    | Description                                                                                                                                      |
|----|---------------|--------------------------------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER        | Header of the packet                                                                                                                             |
| 2  | NUM_INSTANCES | [31:0] Number of Instances. Minimum value is one; if zero is programmed, it will be treated as one. This allows for a max of 2^32 - 1 instances. |

#### 9.3.11 <u>MPEG\_INDEX - R7xx/Evergreen/Cayman</u>

MPEG\_INDEX: Packed register writes for MPEG and Generation of Indices. The VGT\_PRIMITIVE\_TYPE:PRIM\_TYPE should be a DI\_PT\_RECTLIST.

| DW                                       | Field Name     | Description                                                                                                                                                                                                            |
|------------------------------------------|----------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1                                        | HEADER         | Header field of the packet.                                                                                                                                                                                            |
| 2                                        |                | Number of Indices the VGT will actually fetch + 3 * number of base indices given at end of this packet. Valid values are 0x0003 to 0x3FFF.                                                                             |
| 3                                        | DRAW_INITIATOR | Written Unconditional to VGT_DRAW_INITIATOR register                                                                                                                                                                   |
| 4 to 4 +<br>((NUM_IN<br>DICES/3) -<br>1) | 32-Bit INDEX   | First Index of Rect. (0x00000000 to 0xFFFFFFD) For each First Index", CP will generate the other 2 indices and output: FIRST_INDEX FIRST_INDEX+1 FIRST_INDEX+2 All indices are written to the VGT_IMMED_DATA register. |

#### MPEG\_INDEX Packet Description

#### 9.3.12 <u>DISPATCH\_DIRECT - Evergreen/Cayman</u>

Used for dispatching a compute thread with the array dimensions in the packet.

| DW | Field Name         | Description                                                                      |
|----|--------------------|----------------------------------------------------------------------------------|
| 1  | HEADER             | Header of the packet. Shader_Type in bit 1 of the Header will be set to 1, since |
| 1  |                    | Dispatches are only used for Compute Shaders, see Type-3 Packet.                 |
| 2  | DIM_X              | Bits [31:0] + x dimensions of the array of thread groups to be dispatched        |
| 3  | DIM_Y              | Bits [31:0] + y dimensions of the array of thread groups to be dispatched        |
| 4  | DIM_Z              | Bits [31:0] + z dimensions of the array of thread groups to be dispatched        |
| 5  | DISPATCH_INITIATOR | Dispatch Initiator Register. Written to the VGT_DISPATCH_INITIATOR               |
|    |                    | register for the assigned context.                                               |

#### **DISPATCH\_DIRECT** Packet Description

#### 9.3.13 <u>DISPATCH INDIRECT - Evergreen/Cayman</u>

Used for dispatching a compute thread with the array dimensions fetched from memory.

//At the specified offset, the following data members will be in this order. struct GroupDimensions

{ UINT DIM\_X; UINT DIM\_Y; UINT DIM\_Z;

};

| DW | Field Name         | Description                                                                         |
|----|--------------------|-------------------------------------------------------------------------------------|
| 1  | HEADER             | Header of the packet. Shader_Type in bit 1 of the Header will be set to 1, since    |
| 1  | HEADER             | Dispatches are only used for Compute Shaders, see Type-3 Packet.                    |
| 2  | DATA OFFE          | Bits [31:0] + Byte aligned offset where the required data structure starts.Bits 1:0 |
| 2  | DATA_OFFSET        | = 00.                                                                               |
| 2  |                    | Dispatch Initiator Register. Written to the VGT_DISPATCH_INITIATOR                  |
| 3  | DISPATCH_INITIATOR | register for the assigned context.                                                  |

#### **DISPATCH\_INDIRECT** Packet Description

# 9.4 State Management Packets

#### 9.4.1 <u>CLEAR\_STATE - Evergreen/Cayman</u>

The purpose of the Clear\_State packet is to reduce command buffer preamble setup time for all driver versions. The definition of Clear State is essentially everything off, resources all NULL, other value set to the API defined default state, nulling of constant buffers and constants programmed via the GRBM (Tex\_Resource, Tex\_Samplers, Boolean, Loop, Ctl). Clear State includes all multi-copy state (Gfx Decode) and Constants nulling registers. It does not include single copy configuration registers. The Constants are cleared via five new context registers: SQ\_TEX\_SAMPLER\_CLEAR, SQ\_TEX\_RESOURCE\_CLEAR, SQ\_LOOP\_BOOL\_CLEAR, SQ\_VTX\_BASE\_VTX\_LOC, SQ\_VTX\_START\_INST\_LOC. ALU constant buffers are managed in the GFXDEC (8-state) space so they are cleared by the normal clear by clearing the SQ\_PGM\_\*, and SQ\_ALU\_CONST\_BUFFER\_SIZE\_\* registers.

#### **CLEAR\_STATE** Packet Description

| DW | Field Name | Description                                                                                                                 |
|----|------------|-----------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet. |
| 2  | DUMMY      | Dummy Data                                                                                                                  |

#### 9.4.2 <u>DEALLOC\_STATE - Cayman</u>

The purpose of the DEALLOC\_STATE packet is to free an allocated compute shader state back to the state-set pool. Shader type should always be '1', i.e., compute shader.

- It will be placed in the Ring Buffer by the kernel driver.
- The driver will guarantee that no computer shaders are active in the GPU when this packet is received.
- There may or may not have been any Dispatch packets between the CLEAR\_STATE packet that allocated the CS state and this packet that deallocates (frees it).
- Though there will be a write to the GFX\_COPY\_STATE register sometime after the context done event, a significant number of other packets could be processed between them.

| DW | Field Name | Description                                                                                                                 |
|----|------------|-----------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will be '1', i.e., Computer Shader, see the Type-3 Packet section. |
| 2  | DUMMY      | Dummy Data                                                                                                                  |

#### **DEALLOC\_STATE** Packet Description

#### 9.4.3 <u>MODE CONTROL - Evergreen/Cayman</u>

This generic packet is used to reset the Graphics chip out of DX9 ALU Constant Emulation mode back to DX10 Constant Buffer mode.

| DW | Field Name | Description           |
|----|------------|-----------------------|
| 1  | HEADER     | Header of the packet. |
| 2  | CMD        | Command               |
|    |            | [31:3] - Reserved     |
|    |            | [2:0] - CMD           |

#### PREAMBLE\_CONTROL Packet Description

|  | 000: Reserved                                                                  |
|--|--------------------------------------------------------------------------------|
|  | 001: Reset DX9 Constant Emulation Mode. That is, switch to DX10 style constant |
|  | buffer mode.                                                                   |

#### 9.4.4 <u>CONTEXT\_CONTROL - R7xx/Evergreen/Cayman</u>

The CONTEXT\_CONTROL packet controls the processing of LOAD packets and shadowing for SET packets. The Load Control bits enable/disable the CP's processing of each type of LOAD packet. When the Load Control bits are set, the CP will process the LOAD packets indicated. Likewise, if the bits are cleared, the CP will discard the corresponding LOAD packets indicated. There is a bit for enabling/disabling each Load variation.

| DW | Field Name    | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |
|----|---------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER        | Header of the packet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                  |
| 2  | LOAD_CONTROL  | <ul> <li>Bit 31 - DW Enable - Load Enables will not be updated unless this bit is set</li> <li>Bits 30:13 - Not used</li> <li>Bit 12 - Enable Load CS Samplers</li> <li>Bit 11 - Enable Load CS Resources</li> <li>Bit 10 - Enable Load CS Loop Constants</li> <li>Bit 9 - Enable Load CS Boolean Constants</li> <li>Bit 8 - Enable Load CS Multi-Context Render State Registers</li> <li>Bit 7 - Enable Load Control Constants</li> <li>Bit 6 - Enable Load Resources</li> <li>Bit 4 - Enable Load Loop Constants</li> <li>Bit 3 - Enable Load Control Constants</li> <li>Bit 4 - Enable Load ALU Constants</li> <li>Bit 2 - Enable Load Multi-Context Render State Registers</li> <li>Bit 1 - Enable Load Loop Constants</li> <li>Bit 2 - Enable Load Multi-Context Render State Registers</li> </ul>                                                                                                                                                                                                               |
| 3  | SHADOW_ENABLE | <ul> <li>Bit 0 - Enable Load Single-Context Configuration Registers</li> <li>Bit 31 - DW Enable - Shadow Enables will not be updated unless this bit is set.</li> <li>Bits 30:13 - Not used</li> <li>Bit 12 - Enable Shadowing of CS Samplers</li> <li>Bit 11 - Enable Shadowing of CS Resources</li> <li>Bit 10 - Enable Shadowing of CS Loop Constants</li> <li>Bit 9 - Enable Shadowing of CS Boolean Constants</li> <li>Bit 8 - Enable Shadowing of CS Multi-Context Render State Registers</li> <li>Bit 7 - Enable Shadowing of Camplers</li> <li>Bit 6 - Enable Shadowing of Camplers</li> <li>Bit 5 - Enable Shadowing of Resources</li> <li>Bit 4 - Enable Shadowing of Loop Constants</li> <li>Bit 3 - Enable Shadowing of Boolean Constants</li> <li>Bit 4 - Enable Shadowing of ALU Constants</li> <li>Bit 2 - Enable Shadowing of Multi-Context Render State Registers</li> <li>Bit 1 - Enable Shadowing of Samplers</li> <li>Bit 1 - Enable Shadowing of Multi-Context Render State Registers</li> </ul> |

#### **CONTEXT\_CONTROL** Packet Description

# 9.4.5 <u>LOAD\_LOOP\_CONST - R7xx/Evergreen/Cayman</u>

This packet provides the ability to have the CP:

- Initialize the LOOP\_CONST\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_LOOP\_CONST packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, CONST\_OFFSET and NUM\_DWordS are programmed to zero.
- Fetch ALU constant data from external memory into the chip, that was previously shadowed. For this case, there are 5 or more DWs and all are meaningful.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = LOOP\_CONST\_BASE[39:2] + CONST\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below: • Reg\_Start\_Address[17:2] = SQ\_LOOP\_COUNT\_CONST\_0[17:2] + CONST\_OFFSET

| DW  | Field Name   | Description                                                                                                                                                                           |
|-----|--------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1   | HEADER       | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet.                                                           |
| 2   | BASE_ADDR_LO | [31:2] - LOOP_CONST_BASE (31:2) for the block in Memory from where the CP will fetch the constants.                                                                                   |
| 3   | BASE_ADDR_HI | [7:0] - LOOP_CONST_BASE (39:32) for the block in Memory from where the CP will fetch the constants.                                                                                   |
| 4   | CONST_OFFSET | [31:16] - Reserved<br>[15:0] - CONST_OFFSET[15:0] in DWords from the base alu constant address (SQ_<br>LOOP _CONSTANT0_0) and base alu constant memory address (LOOP<br>_CONST_BASE). |
| 5   | NUM_DWords   | [31:14] - Reserved<br>NUM_DWords[13:0] - Number of DWords that the CP will fetch and write into the<br>chip. A value of zero will cause no constants to be loaded.                    |
| N   | CONST_OFFSET | [31:16] - Reserved<br>CONST_OFFSET[15:0] - Same Definition as Above.                                                                                                                  |
| N+1 | NUM_DWord    | [31:14] - Reserved<br>NUM_DWords[13:0] - Same Definition as Above.                                                                                                                    |

### LOAD\_LOOP\_CONST Packet Description

# 9.4.6 <u>LOAD\_ALU\_CONST - R7xx/Evergreen/Cayman</u>

The LOAD\_ALU\_CONST packet is used by the driver to specify the ALU constant buffer base address for the CP to use as the start address of the 8 Constant buffers it manages to emulate DX9 ALU data on HW that only supports constant buffers.

| LOAD_ALU | _CONST Packet Description |
|----------|---------------------------|
|----------|---------------------------|

| DW | Field Name     | Description                                    |
|----|----------------|------------------------------------------------|
| 1  | HEADER         | Header of the packet                           |
|    |                | [31:9] - Address [39:17]. 128 K Byte boundary. |
| 2  | ALU_CONST_BASE | [8:1] - Reserved.                              |
|    |                | [0] - COMPLETE_UPDATE                          |

| 0: CP manages incremental ALU CONST Updates             |
|---------------------------------------------------------|
| 1: Driver includes a complete set of ALU CONST updates. |

# 9.4.7 LOAD BOOL CONST - R7xx/Evergreen/Cayman

This packet provides the ability to have the CP initialize the BOOL\_CONST\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_BOOL\_CONST packet is processed and shadowing is enabled (via the CONTEXT CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, CONST OFFSET and NUM\_DWordS are programmed to zero.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = BOOL\_CONST\_BASE[39:2] + CONST\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below:

Reg\_Start\_Address[17:2] = SQ\_BOOL\_CONST\_0[17:2] + CONST\_OFFSET

| DW   | Field Name    | Description                                                                            |
|------|---------------|----------------------------------------------------------------------------------------|
| 1    | HEADER        | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader |
| 1    |               | type of the Load, see Type-3 Packet.                                                   |
| 2    | BASE_ADDR_LO  | [31:2] - BOOL_CONST_BASE (31:2) for the block in Memory from where the CP will         |
|      |               | fetch the constants.                                                                   |
| 3    | BASE_ADDR_HI  | [7:0] - BOOL_CONST_BASE (39:32) for the block in Memory from where the CP will         |
| 5    | DASE_ADDR_III | fetch the constants.                                                                   |
|      | CONST_OFFSET  | [15:0] - CONST_OFFSET[15:0] in DWords from the base alu constant address               |
| 4    |               | (SQ_BOOL_CONSTANT0_0) and base alu constant memory address                             |
|      |               | (BOOL_CONST_BASE).                                                                     |
| 5    | NUM DWordS    | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the          |
| 5    |               | chip. A value of zero will cause no constants to be loaded.                            |
| N    | CONST_OFFSET  | [31:16] - Reserved                                                                     |
| 1    |               | CONST_OFFSET[15:0] - Same Definition as Above.                                         |
| N+1  | NUM_DWordS    | [31:14] - Reserved                                                                     |
| 19+1 |               | NUM_DWordS[13:0] - Same Definition as Above.                                           |

LOAD\_BOOL\_CONST Packet Description

# 9.4.8 LOAD CONFIG REG - R7xx/Evergreen/Cayman

For this case, there are 5 or more DWs and all are meaningful.

This packet provides the ability to have the CP:

Initialize the CONFIG REG BASE (BASE ADDR \* fields) internally for later use when a SET CONFIG REG packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, REG\_OFFSET and NUM\_DWordS are programmed to zero. Fetch single-context-configuration register data from external memory into the chip that was previously shadowed.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = CONFIG\_REG\_BASE[39:2] + REG\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below:

• Reg\_Start\_Address[17:2] = 0x2000 + REG\_OFFSET (Note: Byte Offset 0x8000 = DWord Offset 0x2000)

| DW  | Field Name      | Description                                                                            |
|-----|-----------------|----------------------------------------------------------------------------------------|
| 1   | HEADER          | Header of the packet. Shader_Type in bit 1 of the Type-Header will always be zero      |
| 1   |                 | since these are configuration registers.                                               |
| 2   | BASE_ADDR_LO    | [31:2] - CONFIG_REG_BASE (31:2) for the block in Memory from where the CP will         |
| 2   |                 | fetch the state.                                                                       |
| 2   | WAIT FOR IDLE   | [31] - If set the CP will wait for the graphics pipe to be idle by writing to the GRBM |
| 3   | 3 WAIT_FOR_IDLE | Wait Until register with Wait for 3D idle".                                            |
|     | BASE_ADDR_HI    | [7:0] - CONFIG_REG_BASE (39:32) for the block in Memory from where the CP will         |
|     | BASE_ADDK_III   | fetch the state.                                                                       |
| 4   | REG_OFFSET      | [15:0] - REG_OFFSET[15:0] in DWords from the register base address (Fixed at           |
| 4   |                 | 0x2000 in DWs) and memory base address (CONFIG_REG_BASE).                              |
| 5   | NUM_DWordS      | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the          |
| 3   |                 | chip. A value of zero will cause no constants to be loaded.                            |
| N   | REG_OFFSET      | [31:16] - Reserved                                                                     |
| N   |                 | REG_OFFSET[15:0] - Same Definition as Above.                                           |
| N+1 | NUM_DWordS      | [31:14] - Reserved                                                                     |
| 1+1 |                 | NUM_DWordS[13:0] - Same Definition as Above.                                           |
|     |                 |                                                                                        |

# LOAD\_CONFIG\_REG Packet Description

# 9.4.9 <u>LOAD\_CONTEXT\_REG - R7xx/Evergreen/Cayman</u>

This packet provides the ability to have the CP:

- Initialize the CONTEXT\_REG\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_CONTEXT\_REG packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, REG\_OFFSET and NUM\_DWordS are programmed to zero.
- Fetch eight-context-configuration register data from external memory into the chip that was previously shadowed. For this case, there are 5 or more DWs and all are meaningful.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = CONTEXT\_REG\_BASE[39:2] + REG\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below: • Reg\_Start\_Address[17:2] =  $0xA000 + REG_OFFSET$  (Note: Byte Offset 0x28000 = DWord Offset 0xA000)

|    |               | •                                                                                      |
|----|---------------|----------------------------------------------------------------------------------------|
| DW | Field Name    | Description                                                                            |
| 1  | HEADER        | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader |
| 1  | HEADER        | type of the Load, see Type-3 Packet.                                                   |
| 2  | BASE ADDR LO  | [31:2] - CONTEXT_REG_BASE (31:2) for the block in Memory from where the CP             |
|    | DASE_ADDK_LUJ | will fetch the state.                                                                  |
| 2  | BASE ADDR HI] | [7:0] - CONTEXT_REG_BASE (39:32) for the block in Memory from where the CP             |
| 5  | DASE_ADDK_HIJ | will fetch the state.                                                                  |
| 4  | REG_OFFSET    | [15:0] - REG_OFFSET[15:0] in DWords from the register base address (Fixed at           |
|    |               | 0xA000 in DWs) and memory base address (CONTEXT _REG_BASE)                             |

# LOAD\_CONTEXT\_REG Packet Description

| 5   | NUM_DWordS | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the chip. A value of zero will cause no constants to be loaded. |
|-----|------------|-------------------------------------------------------------------------------------------------------------------------------------------|
| N   | REG_OFFSET | [31:16] - Reserved<br>REG_OFFSET[15:0] - Same Definition as Above.                                                                        |
| N+1 | NUM_DWordS | [31:14] - Reserved<br>NUM_DWordS[13:0] - Same Definition as Above.                                                                        |

# 9.4.10 LOAD\_CTL\_CONST - R7xx/Evergreen/Cayman

This packet provides the ability to have the CP:

- Initialize the CTL\_CONST\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_CTL\_CONST packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, CONST\_OFFSET and NUM\_DWordS are programmed to zero.
- Fetch ALU constant data from external memory into the chip, that was previously shadowed. For this case, there are 5 or more DWs and all are meaningful.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = CTL\_CONST\_BASE[39:2] + CONST\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below:

• Reg\_Start\_Address[17:2] = mmSQ\_VTX\_BASE\_VTX\_LOC[17:2] + CONST\_OFFSET

| DW   | Field Name   | Description                                                                            |
|------|--------------|----------------------------------------------------------------------------------------|
| 1    | HEADER       | Header of the packet. Shader_Type in bit 1 of the Header will always be zero, since no |
| 1    |              | CS constants are set with this packet, see Type-3 Packet.                              |
| 2    | BASE_ADDR_LO | [31:2] - Lower Base address bits (31:2) for the block in Memory from where the CP      |
| 2    |              | will fetch the constants.                                                              |
| 3    |              | [7:0] - Upper Base address bits (39:32) for the block in Memory from where the CP      |
| 5    | BASE_ADDR_HI | will fetch the constants.                                                              |
| 4    | CONST_OFFSET | CONST_OFFSET[15:0] - Offset in DWords from the base address.                           |
| 5    | NUM DWords   | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the          |
| 5    | NUM_DWords   | chip. A value of zero will cause no constants to be loaded.                            |
| N    | CONST_OFFSET | [31:16] - Reserved                                                                     |
| IN I |              | CONST_OFFSET[15:0] - Same Definition as Above.                                         |
| N+1  | NUM DWords   | [31:14] - Reserved                                                                     |
|      | NUM_DWords   | NUM_DWordS[13:0] - Same Definition as Above.                                           |

### LOAD\_CTL\_CONST Packet Description

# 9.4.11 LOAD RESOURCE - R7xx/Evergreen/Cayman

This packet provides the ability to have the CP:

- Initialize the RESOURCE\_CONST\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_RESOURCE packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, CONST\_OFFSET and NUM\_DWordS are programmed to zero.
- Fetch ALU constant data from external memory into the chip, that was previously shadowed. For this case, there are 5 or more DWs and all are meaningful.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = RESOURCE\_CONST\_BASE[39:2] + CONST\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below: • Reg Start Address[17:2] = SQ TEX RESOURCE WORD0 0[17:2] + CONST OFFSET

| DW     | Field Name   | Description                                                                            |
|--------|--------------|----------------------------------------------------------------------------------------|
| 1      | HEADER       | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader |
| -      |              | type of the Load, see Type-3 Packet.                                                   |
| 2      | BASE_ADDR_LO | [31:2] - RESOURCE_CONST_BASE (31:2) for the block in Memory from where the             |
| 2      |              | CP will fetch the constants.                                                           |
| 3      |              | [7:0] - RESOURCE_CONST_BASE (39:32) for the block in Memory from where the             |
| 3      | BASE_ADDR_HI | CP will fetch the constants.                                                           |
|        | CONST_OFFSET | [15:0] - CONST_OFFSET[15:0] in DWords from the base alu constant address               |
| 4      |              | (SQ_TEX_RESOURCE_WORD0_0) and base alu constant memory address                         |
|        |              | (RESOURCE_CONST_BASE).                                                                 |
| 5      |              | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the          |
| 5      | NUM_DWords   | chip. A value of zero will cause no constants to be loaded.                            |
| N      | CONST_OFFSET | [31:16] - Reserved                                                                     |
| Ν      |              | CONST_OFFSET[15:0] - Same Definition as Above.                                         |
| NT . 1 | NUM_DWords   | [31:14] - Reserved                                                                     |
| N+1    |              | NUM_DWordS[13:0] - Same Definition as Above.                                           |

#### LOAD\_RESOURCE Packet Description

# 9.4.12 <u>LOAD\_SAMPLER - R7xx/Evergreen/Cayman</u>

This packet provides the ability to have the CP initialize the SAMPLER\_CONST\_BASE (BASE\_ADDR\_\* fields) internally for later use when a SET\_SAMPLER packet is processed and shadowing is enabled (via the CONTEXT\_CONTROL packet). For this case, there are 5 DWs in the packet and DWs 4 and 5, CONST\_OFFSET and NUM\_DWordS are programmed to zero. The CP also uses this packet to fetch ALU constant data from external memory into the chip, that was previously shadowed. For this case, there are 5 or more DWs and all are meaningful.

The CP computes the DWord-aligned external memory read address as follows:

• Mem\_Start\_Address[39:2] = SAMPLER\_CONST\_BASE[39:2] + CONST\_OFFSET

The CP writes the 'loaded' data to consecutive register addresses. The starting address is computed as shown below:

• Reg\_Start\_Address[17:2] = SQ\_TEX\_SAMPLER\_WORD0\_0[17:2] + CONST\_OFFSET

| DW | Field Name   | Description                                                                                                                 |
|----|--------------|-----------------------------------------------------------------------------------------------------------------------------|
| 1  | HEADER       | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet. |
| 2  | BASE_ADDR_LO | [31:2] - SAMPLER_CONST_BASE (31:2) for the block in Memory from where the CP will fetch the constants.                      |
| 3  | BASE_ADDR_HI | [7:0] - SAMPLER _CONST_BASE (39:32) for the block in Memory from where the                                                  |

### LOAD\_SAMPLER Packet Description

|     |              | CP will fetch the constants.                                                                                                                                       |
|-----|--------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 4   | CONST_OFFSET | [15:0] - CONST_OFFSET[15:0] in DWords from the base alu constant address<br>(SQ_TEX_SAMPLER_WORD0_0) and base alu constant memory address<br>(SAMPLER_CONST_BASE). |
| 5   | NUM_DWords   | NUM_DWordS[13:0] - Number of DWords that the CP will fetch and write into the chip. A value of zero will cause no constants to be loaded.                          |
| N   | CONST_OFFSET | [31:16] - Reserved<br>CONST_OFFSET[15:0] - Same Definition as Above.                                                                                               |
| N+1 | NUM_DWords   | [31:14] - Reserved<br>NUM_DWordS[13:0] - Same Definition as Above.                                                                                                 |

# 9.4.13 <u>SET\_BOOL\_CONST - R7xx/Evergreen/Cayman</u>

The SET\_BOOL\_CONST packet loads the constant Boolean data, which is embedded in the packet, into the chip. The CONST\_OFFSET field is a DWord-offset from the starting address. All the constant data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for

Boolean constants is computed as follows:

• Reg\_Start\_Address[17:2] = SQ\_BOOL\_CONST\_0[17:2] + CONST\_OFFSET

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the Boolean constants to be reloaded into the chip after a context switch with the LOAD\_BOOL\_CONST (LBC) packet. The LBC packet sets the BOOL\_CONST\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to external memory (see these packets for more details). The starting external memory address that the constant data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = BOOL\_CONST\_BASE[39:2] + CONST\_OFFSET

| DW     | Field Name | Description                                                                                                                                                                                                                 |
|--------|------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet.                                                                                                 |
| 2      |            | <ul><li>[31:16] - Must be programmed to zero for legacy support</li><li>[15:0] - Offset in DWords from the BOOL const base address (DW address of SQ_BOOL_CONSTANT0_0) and memory base address (BOOL_CONST_BASE).</li></ul> |
| 3 to N | CONST_DATA | DWord Data for Constants.                                                                                                                                                                                                   |

# SET\_BOOL\_CONST Packet Description

# 9.4.14 <u>SET CONFIG REG - R7xx/Evergreen/Cayman</u>

The SET\_CONFIG\_REG packet loads the single-context-configuration register data, which is embedded in the packet, into the chip. The REG\_OFFSET field is a DWord-offset from the starting address. All the register data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for register data is computed as follows:

• Reg\_Start\_Address[17:2] = 0x2000 + REG\_OFFSET (Note: Byte Offset 0x8000; DWord Offset 0x2000)

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the register data to be reloaded into the chip after a context switch with the LOAD\_CONFIG\_REG (LCFG) packet. The LCFG packet sets the REG\_CONFIG\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to

external memory (see these packets for more details). The starting external memory address that the register data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = CONFIG\_REG\_BASE[39:2] + REG\_OFFSET

| DW     | Field Name | Description                                                                                                                                                                                           |  |
|--------|------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet.                                                                           |  |
| 2      | REG_OFFSET | <ul><li>[31:16] - Must be programmed to zero for legacy support</li><li>[15:0] - Offset in DWords from the register base address (0x2000 in DWs) and memory base address (CONFIG_REG_BASE).</li></ul> |  |
| 3 to N | REG_DATA   | DWord Data for Registers.                                                                                                                                                                             |  |

### SET\_CONFIG\_REG Packet Description

### 9.4.15 <u>SET\_CONTEXT\_REG - R7xx/Evergreen/Cayman</u>

This packet loads the eight-context-renderstate register data, which is embedded in the packet, into the chip. The REG\_OFFSET field is a DWord-offset from the starting address. All the render state data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for register data is computed as follows:

• Reg\_Start\_Address[17:2] = 0xA000 + REG\_OFFSET (Note: Byte Offset 0x28000; DWord Offset 0xA000)

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the register data to be reloaded into the chip after a context switch with the LOAD\_CONTEXT\_REG (LCTX) packet. The LCTX packet sets the REG\_CONTEXT\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to external memory (see these packets for more details). The starting external memory address that the render state data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = CONTEXT\_REG\_BASE[39:2] + REG\_OFFSET

| DW     | Field Name | Description                                                                            |  |
|--------|------------|----------------------------------------------------------------------------------------|--|
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader |  |
|        |            | type of the Load, see Type-3 Packet.                                                   |  |
| 2      | REG_OFFSET | [15:0] - Offset in DWords from the register base address (0xA000 in DWs) and memory    |  |
|        |            | base address (CONTEXT_REG_BASE).                                                       |  |
| 3 to N | REG_DATA   | DWord Data for Registers or DW Offset into the Patch Table.                            |  |

### SET\_CONTEXT\_REG Packet Description

### 9.4.16 SET CTL CONST - R7xx/Evergreen/Cayman

This packet loads the constant Control data, which is embedded in the packet, into the chip. The CONST\_OFFSET field is a DWord-offset from the starting address. All the constant data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for Control constants is computed as follows:

• Reg\_Start\_Address[17:2] = mmSQ\_VTX\_BASE\_VTX\_LOC[17:2] + CONST\_OFFSET

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the Control constants to be reloaded into the chip after a context switch with the LOAD\_CTL\_CONST (LCC) packet. The LCC packet sets the CONTROL\_CONST\_BASE and the CONTEXT\_CONTROL packet enables/disables write

shadowing to external memory (see these packets for more details). The starting external memory address that the constant data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = CONTROL\_CONST\_BASE[39:2] + CONST\_OFFSET

| SET_ | CTL_ | CONST | Packet | Description |  |
|------|------|-------|--------|-------------|--|
|      |      |       |        | -           |  |

| DW     | Field Name | Description                                                                               |
|--------|------------|-------------------------------------------------------------------------------------------|
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will always be zero, since no CS |
|        |            | constants are set with this packet, see Type-3 Packet.                                    |
|        |            | [31:16] - Must be programmed to zero for legacy support                                   |
| 2      |            | [15:0] - Offset in DWords from the CTL const base address (DW address of                  |
|        | Т          | mmSQ_VTX_BASE_VTX_LOC) and memory base address                                            |
|        |            | (CONTROL_CONST_BASE).                                                                     |
| 3 to N | CONST_DATA | DWord Data for Constants.                                                                 |

### 9.4.17 <u>SET\_LOOP\_CONST - R7xx/Evergreen/Cayman</u>

This packet loads the constant Loop data, which is embedded in the packet, into the chip. The CONST\_OFFSET field is a DWord-offset from the starting address. All the constant data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for Loop constants is computed as follows: • Reg\_Start\_Address[17:2] = SQ\_LOOP\_COUNT\_CONST\_0[17:2] + CONST\_OFFSET

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the Loop constants to be reloaded into the chip after a context switch with the LOAD\_LOOP\_CONST (LLC) packet. The LLC packet sets the LOOP\_CONST\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to external memory (see these packets for more details). The starting external memory address that the

constant data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = LOOP\_CONST\_BASE[39:2] + CONST\_OFFSET

| DW     | Field Name | Description                                                                            |
|--------|------------|----------------------------------------------------------------------------------------|
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader |
| 1      |            | type of the Load, see Type-3 Packet.                                                   |
| 2      |            | [31:16] - Must be programmed to zero for legacy support                                |
|        |            | [15:0] - Offset in DWords from the LOOP const base address (DW address of              |
|        | Т          | SQ_LOOP_COUNT_CONSTANT0_0) and memory base address                                     |
|        |            | (LOOP_CONST_BASE).                                                                     |
| 3 to N | CONST_DATA | DWord Data for Constants.                                                              |

# SET\_LOOP\_CONST Packet Description

### 9.4.18 <u>SET\_RESOURCE - R7xx/Evergreen/Cayman</u>

This packet loads the Resource data, which is embedded in the packet, into the chip. The CONST\_OFFSET field is a DWord-offset from the starting address. All the Resource data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for Resources is computed as follows:

• Reg\_Start\_Address[17:2] = SQ\_TEX\_RESOURCE\_WORD0\_0[17:2] + CONST\_OFFSET

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the resource data to be reloaded into the chip after a context switch with the LOAD\_RESOURCE (LRS) packet. The LRS packet sets the RESOURCE\_CONST\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to

external memory (see these packets for more details). The starting external memory address that the Resource data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = RESOURCE\_CONST\_BASE[39:2] + CONST\_OFFSET

| SET_RESOURCE Tacket Description |                  |                                                                                                                                                              |  |
|---------------------------------|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| DW                              | Field Name       | Description                                                                                                                                                  |  |
| -                               | HEADER           | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet.                                  |  |
| 2                               | CONST_OFFSE<br>T | [15:0] - CONST_OFFSET in DWords from the RESOURCE const base address (DW address of SQ_ TEX_RESOURCE_WORD0_0) and memory base address (RESOURCE_CONST_BASE). |  |
| 3 to N                          | CONST_DATA       | DWord Data for Constants.                                                                                                                                    |  |

# SET\_RESOURCE Packet Description

### 9.4.19 <u>SET\_SAMPLER - R7xx/Evergreen/Cayman</u>

This packet loads the Sampler data, which is embedded in the packet, into the chip. The CONST\_OFFSET field is a DWord-offset from the starting address. All the Sampler data in the packet is written to consecutive register addresses beginning at the starting address. The starting address for Samplers is computed as follows:

• Reg\_Start\_Address[17:2] = SQ\_TEX\_SAMPLER\_WORD0\_0[17:2] + CONST\_OFFSET

The CP will write the data to external memory if the corresponding shadow enable is set. This allows the sampler data to be reloaded into the chip after a context switch with the LOAD\_SAMPLER (LSP) packet. The LSP packet sets the SAMPLER\_CONST\_BASE and the CONTEXT\_CONTROL packet enables/disables write shadowing to external memory (see these packets for more details). The starting external memory address that the sampler data is written to is computed as follows:

• Mem\_Start\_Address[39:2] = SAMPLER\_CONST\_BASE[39:2] + CONST\_OFFSET

|        |            | -                                                                                                                           |
|--------|------------|-----------------------------------------------------------------------------------------------------------------------------|
| DW     | Field Name | Description                                                                                                                 |
| 1      | HEADER     | Header of the packet. Shader_Type in bit 1 of the Header will correspond to the shader type of the Load, see Type-3 Packet. |
| 2      |            | [15:0] - Offset in DWords from the SAMPLER const base address (DW address of SQ_                                            |
| -      | Т          | TEX_SAMPLER_WORD0_0) and memory base address (SAMPER_CONST_BASE).                                                           |
| 3 to N | CONST_DATA | DWord Data for Constants.                                                                                                   |

### **SET\_SAMPLER** Packet Description

# 9.5 Command Predication Packets

### 9.5.1 <u>COND\_EXEC - R7xx/Evergreen/Cayman</u>

Perform a conditional execution of a sequence of packets (type 0, 2, and type 3) based on a Boolean value stored in memory.

Note: Care must be taken to make certain that EXEC\_COUNT contains the exact number of DWords for the subsequent packets that are to be predicated if the Boolean value is zero. The CP.PFP will start parsing the DWord immediately following EXEC\_COUNT DWords.

#### **COND\_EXEC** Packet Description

| Γ | DW | Field Name | Description |
|---|----|------------|-------------|
|---|----|------------|-------------|

| 1 | HEADER           | Header of the packet                                                                                                                                            |
|---|------------------|-----------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 2 | BOOL_ADDR_<br>LO | Bits [31:2] is Boolean Address bits [31:2].                                                                                                                     |
| 3 | BOOL_ADDR_<br>HI | Bits[7:0] Boolean Address bits [39:32].                                                                                                                         |
| 4 | EXEC_COUNT       | EXEC_COUNT: [13:0] - total number of DWords of the subsequent predicated packets.<br>This count wraps the packets that will be predicated by the device select. |

### 9.5.2 <u>COND\_WRITE - R7xx/Evergreen/Cayman</u>

The CP reads either a memory or a register location (indicated by POLL\_SPACE) and tests the polled value with the reference value provided in the command packet. The test is qualified by both the specified function and mask. If the test passes, the write occurs to either a register or memory depending on WRITE\_SPACE. If the test fails, the CP skips the write. In either case, the CP then continues parsing the command stream.

| DW | Field Name      | Description                                                                           |
|----|-----------------|---------------------------------------------------------------------------------------|
| 1  | HEADER          | Header of the packet                                                                  |
| 2  | WRITE_SPACE     | 31:9 - Reserved                                                                       |
|    |                 | 8 - WRITE_SPACE                                                                       |
|    |                 | 0=Register,                                                                           |
|    |                 | 1=Memory                                                                              |
|    | POLL_SPACE      | 7:5 - Reserved                                                                        |
|    |                 | 4: POLL_SPACE                                                                         |
|    |                 | 0=Register,                                                                           |
|    |                 | 1=Memory                                                                              |
|    | FUNCTION        | 2:0 - FUNCTION                                                                        |
|    |                 | - 000 - Always (Compare Passes). Still does read operation and waits for data to      |
|    |                 | return.                                                                               |
|    |                 | - 001 - Less Than () the Reference Value.                                             |
|    |                 | - 010 - Less Than or Equal (=) to the Reference Value.                                |
|    |                 | - 011 - Equal (=) to the Reference Value.                                             |
|    |                 | - 100 - Not Equal (!=) to the Reference Value.                                        |
|    |                 | - 101 - Greater Than or Equal (=) to the Reference Value.                             |
|    |                 | - 110 - Greater Than () the Reference Value.                                          |
|    |                 | - 111 - Reserved                                                                      |
| 3  | POLL_ADDRESS_LO | Lower portion of Address to poll If the address is a memory location then bits        |
|    |                 | [31:2] specify the lower bits of the address and [1:0] is the swap code to be used.If |
|    |                 | the address is a memory-mapped register, then bits [15:0] is the DWord memory-        |
|    |                 | mapped register address that the CP will read.                                        |
| 4  | POLL_ADDRESS_HI | Higher portion Address to pollIf the address is a memory location then bits [7:0]     |
|    |                 | specify bits 39:32 of the address. If the address is a memory-mapped register, then   |
|    |                 | this DW is a don't care.                                                              |
| 5  | REFERENCE       | Reference Value [31:0].                                                               |
| 6  | MASK            | Mask for Comparison [31:0]                                                            |

#### **COND\_WRITE** Packet Description

| 7 | WRITE_ADDRESS_LO | If WRITE_SPACE + Register: WRITE_ADDRESS[15:0] - DWord memory-          |
|---|------------------|-------------------------------------------------------------------------|
|   |                  | mapped register address that the will be written. ElseIf WRITE_SPACE +  |
|   |                  | Memory: WRITE_ADDRESS[31:2] - DWord-Aligned Address of destination      |
|   |                  | memory location. WRITE_ADDRESS[1:0] - SWAP Used for Memory Write.       |
| 8 | WRITE_ADDRESS_HI | If WRITE_SPACE + Register: This DWord is a don't care. If WRITE_SPACE + |
|   |                  | Memory: Bits 7:0 - WRITE_ADDRESS[39:32]                                 |
| 9 | WRITE_DATA       | Write Data[31:0] that will be conditionally written to the ADDRESS.     |

# 9.5.3 <u>SET\_PREDICATION - R7xx/Evergreen/Cayman</u>

The SET\_PREDICTION packet provides a single flexible packet for the driver to specify type type of predication check for previous events: ZPASS, PRIMCOUNT, etc.

| DW         | OW         Field Name         Description |                                                                                                                                                                                                                                                                                                                                                                                           |  |  |
|------------|-------------------------------------------|-------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|--|
| 1          | HEADER                                    | Header of the packet                                                                                                                                                                                                                                                                                                                                                                      |  |  |
| 2          | START_ADDR_LO                             | Bits [31:4] is start address bits [31:4]. Supports a 16 byte aligned address for DB0 count.                                                                                                                                                                                                                                                                                               |  |  |
| 3 CONTINUE |                                           | <ul> <li>Bit [31] - Continue set predication (Valid only for use with ZPASS). This field is used to allow accumulation of ZPASS count data across command buffer boundaries.</li> <li>- 0: This SET_PREDICTION packet is a unique packet or the first of a series of SET_PREDICATION packets.</li> <li>- 1: This SET_PREDICATION packet is a continuation of the previous one.</li> </ul> |  |  |
|            | PRED_OP                                   | Bit [18:16] Pred_Op<br>-000: Clear Predicate<br>-001: Set Zpass Predicate<br>-010: Set PrimCount Predicate<br>-011: Reserved<br>-1xx: Reserved .                                                                                                                                                                                                                                          |  |  |
|            | HINT<br>PREDICATION_BOOL<br>EAN           | <ul> <li>Bit [12] Hint (Only valid for Zpass/Occlusion Predicate)</li> <li>-0: CP must wait until final ZPass counts have been written by all DBs.</li> <li>-1: CP should read the results once, if all DBs have not written the results to memory then draw.</li> <li>Bit [8] Predication Boolean (valid for both ops)</li> <li>-0: Draw if not visible/overflow</li> </ul>              |  |  |
|            | START_ADDR_HI                             | <ul><li>- 1: Draw if visible/no overflow</li><li>Bits[7:0] Start Address bits [39:32]</li></ul>                                                                                                                                                                                                                                                                                           |  |  |

# **SET\_PREDICATION** Packet Description

# 9.5.4 <u>PRED\_EXEC - R7xx/Evergreen/Cayman</u>

Functionality Perform a predicated execution of a sequence of packets (type 0, 2, and type 3) on select devices.

Notes: The ME\_INITIALIZE packet includes a GPU unique Device ID. Care must be taken to make certain that EXEC\_COUNT contains the exact number of DWords for the subsequent packets that are to be predicated. The CP will start parsing the DWord immediately following EXEC\_COUNT DWords.

#### PRED\_EXEC Packet Description

| DW Field Name Description |                                     | Description                                                                                                                                                                                                                                                                                       |
|---------------------------|-------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|
| 1                         | HEADER header; Header of the packet |                                                                                                                                                                                                                                                                                                   |
| 2                         |                                     | bit [31:24] device_select; To select one or more device IDs upon which the subsequent<br>predicated packets will be executed<br>bit [13:0] exec_count; Total number of DWords of the subsequent predicated packets. This<br>count wraps the packets that will be predicated by the device select. |

# **9.6 Synchronization Packets**

### 9.6.1 <u>EVENT WRITE - R7xx/Evergreen/Cayman</u>

This packet is used when the driver wants to create a non-TimeStamp/Fence event. See EVENT\_WRITE\_EOP to send timestamps and fences. The EVENT\_WRITE supports two categories of events. Those are:

- 4 DW (DW) event where special handling is required: ZPASS, SAMPLE\_PIPELINESTATS, SAMPLE\_STREAMOUTSTATS[,1,2,3].
- 2 DW (DW) event where no special handling is required; CP just writes EVENT\_TYPE (bits[5:0] of DW 2 from the packet) into VGT\_EVENT\_INITIATOR register and DWs 3 and 4 do not exist, i.e., the packet is only 2 DWs for these events. These include all other events.

# EVENT\_WRITE Packet Description

| DW | Field Name                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     | Description          |  |
|----|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------|--|
| 1  | HEADER                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                         | Header of the packet |  |
| 2  | <ul> <li>2 EVENT_INDEX[11:8] - Event Index         <ul> <li>- 0000: Any non-Time Stamp/non-Fence/non-Trap EVENT_TYPE not listed.</li> <li>- 0001: ZPASS_DONE - 0010: SAMPLE_PIPELINESTAT</li> <li>- 0011: SAMPLE_STREAMOUTSTATS, SAMPLE_STREAMOUTSTATS1, SAMPLE_STREAMOUTSTATS2, SAMPLE_STREAMOUTSTATS3</li> <li>- 0100: CS_PARTIAL_FLUSH, VS_PARTIAL_FLUSH, PS_PARTIAL_FLU</li> <li>- 0101: Reserved for EVENT_WRITE_EOP time stamp/fence event types</li> <li>- 0111 - 1111: Reserved for future use.</li> </ul> </li> </ul> |                      |  |
| 3  | O       Sample_PipelineStats,         Sample_StreamoutStats, and Zpass (Occlusion).         I       Bits [7:0] - Upper bits of Address [39:32] Driver should only supply this DW for                                                                                                                                                                                                                                                                                                                                           |                      |  |

# 9.6.2 <u>EVENT WRITE EOP - R7xx/Evergreen/Cayman</u>

The EVENT\_WRITE\_EOP packet is used when the driver wants to create any end-of-pipe event. TS used below is historical and indicates either fence data, trap or actual time stamp will be written back.

### Supported Events are:

• Cache Flush TS: provides the driver with a pipelined fence/time stamp indicating that the CBs, DBs, and SX have

completed flushing their caches.

- Cache Flush And Inval TS: same as above but the CBs, DBs, and SX, also invalidate their caches before sending the pulse back to the CP.
- Bottom Of Pipe TS: provides the driver with a pipelined time stamp indicating that the CBs, DBs, and SX have completed all work before the time stamp. This can be considered a read EOP event in that all reads have occurred but the CBs/DBs/SX have not written out all the data in their caches.

Use the EVENT\_WRITE packet for all others. Supported actions when requested event has completed are:

- Timestamps 64-bit global GPU clock counter value or CP\_PERFCOUNTER\_HI/LO, either with optional interrupt .
- Fences 32 or 64 bit embedded data in the packet with optional interrupt.

| DW | Field Name                                                                                    | Description                                                                                                                                                         |  |
|----|-----------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1  | HEADER                                                                                        | Header of the packet                                                                                                                                                |  |
| 2  | EVENT_INDEX                                                                                   | KEVENT_INDEX[11:8] - Event Index                                                                                                                                    |  |
|    | EVENT_TYPE                                                                                    | - 0000: Any non-Time Stamp/non-Fence/non-Trap EVENT_TYPE not listed.                                                                                                |  |
|    |                                                                                               | - 0001: ZPASS_DONE                                                                                                                                                  |  |
|    |                                                                                               | - 0010: SAMPLE_PIPELINESTAT                                                                                                                                         |  |
|    |                                                                                               | - 0011: SAMPLE_STREAMOUTSTATS, SAMPLE_STREAMOUTSTATS1,                                                                                                              |  |
|    |                                                                                               | SAMPLE_STREAMOUTSTATS2, SAMPLE_STREAMOUTSTATS3                                                                                                                      |  |
|    |                                                                                               | 0100: CS_PARTIAL_FLUSH, VS_PARTIAL_FLUSH, PS_PARTIAL_FLUSH                                                                                                          |  |
|    |                                                                                               | - 0101: Reserved for EVENT_WRITE_EOP time stamp/fence event types                                                                                                   |  |
|    |                                                                                               | - 0111- 1111: Reserved for future use.                                                                                                                              |  |
|    |                                                                                               | EVENT_TYPE[5:0] - The CP writes this value to the VGT_EVENT_INITIATOR register                                                                                      |  |
|    |                                                                                               | for the assigned context.                                                                                                                                           |  |
| 3  | ADDRESS_LO                                                                                    | [31:2] - Lower bits of DWord-Aligned Address if DATA_SEL = 001",                                                                                                    |  |
|    |                                                                                               | [31:3] - Lower bits of QWORD-Aligned Address if DATA_SEL = 010" or 011", Else don't                                                                                 |  |
|    |                                                                                               | care.                                                                                                                                                               |  |
|    |                                                                                               | Bits [1:0] - Reserved & must be programmed to zero.                                                                                                                 |  |
| 4  | DATA_SEL                                                                                      | [31:29] - DATA_SEL - Selects Source of Data to be written for a End-of-Pipe Done event.                                                                             |  |
|    | - 000 - None, i.e., Discard Data. Used when only an interrupt is needed. Program              |                                                                                                                                                                     |  |
|    |                                                                                               | to 01" 001 - Send 32-bit Data Low (Discard Data High).                                                                                                              |  |
|    |                                                                                               | - 010 - Send 64-bit Data.                                                                                                                                           |  |
|    |                                                                                               | - 011 - Send current value of the 64 bit global GPU clock counter.                                                                                                  |  |
|    |                                                                                               | - 100 - Send current value of the CP_PERFCOUNTER_HI/LO. The intent is for the driver                                                                                |  |
|    |                                                                                               | to have already selected the always count sclks option (0x0) for CP_PERFCOUNTER<br>_SELECT and requested the CP to start counting via the CP_PERFMON_CNTL register. |  |
|    |                                                                                               | - 101-111 Reserved for future use.                                                                                                                                  |  |
|    | INT_SEL                                                                                       | [25:24] - INT_SEL Selects interrupt action for End-of-Pipe Done event.                                                                                              |  |
|    | INT_SEL                                                                                       | - 00 - None (Do not send an interrupt).                                                                                                                             |  |
|    | - 00 - None (Do not send an interrupt).<br>- 01 - Send Interrupt Only. Program DATA_SEL 000'. |                                                                                                                                                                     |  |
|    |                                                                                               | - 10 - Send Interrupt When Write Confirm is received from the MC.                                                                                                   |  |
|    | ADDRESS_HI                                                                                    | [7:0] - ADDR_HI, address bits[39:32]. External memory address written for a End-of-Pipe                                                                             |  |
|    |                                                                                               | Done event. Read returns last value written to memory.                                                                                                              |  |
| 5  |                                                                                               | Data [31:0] value that will be written to memory when event occurs. Driver should always                                                                            |  |
|    |                                                                                               | supply this DW                                                                                                                                                      |  |
|    |                                                                                               | h_LL-1                                                                                                                                                              |  |

# EVENT\_WRITE\_EOP Packet Description

| 6 | DATA_HI | Data [63:32] value that will be written to memory when event occurs. Driver should always |
|---|---------|-------------------------------------------------------------------------------------------|
|   |         | supply this DW                                                                            |

### 9.6.3 <u>EVENT WRITE EOS - Evergreen/Cayman</u>

The EVENT\_WRITE\_EOS packet is used when the driver wants to create any end-of-shader event (end of CS or end of PS).Supported Events are CS Done and PS Done. The CP will generate the end-of-shader event given in the packet by writing to the VGT\_EVENT\_INITIATOR register.

| DW                                                                                                                                                                                                                                                                                                                                 | Field Name                | Description                                                                                                                                                                                                        |  |
|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1                                                                                                                                                                                                                                                                                                                                  | HEADER                    | Header of the packet                                                                                                                                                                                               |  |
|                                                                                                                                                                                                                                                                                                                                    | EVENT_INDEX<br>EVENT_TYPE | EVENT_INDEX[11:8] - Event Index<br>- 0000 - 0100: Reserved for EVENT_WRITE event types<br>- 0101: Reserved for EVENT_WRITE_EOP event types                                                                         |  |
| 2                                                                                                                                                                                                                                                                                                                                  |                           | <ul> <li>Ollo: CS Done, PS Done</li> <li>Others up to 1111: Reserved for future use.</li> <li>EVENT_TYPE[5:0] - The CP writes this value to the VGT_EVENT_INITIATOR</li> </ul>                                     |  |
|                                                                                                                                                                                                                                                                                                                                    |                           | register for the assigned context.                                                                                                                                                                                 |  |
| 3                                                                                                                                                                                                                                                                                                                                  | ADDRESS_LO                | [31:2] - Lower bits of DWord-Aligned Address.                                                                                                                                                                      |  |
| 5                                                                                                                                                                                                                                                                                                                                  | ADDRESS_LO                | [1:0] - Reserved & must be programmed to zero.                                                                                                                                                                     |  |
| 4                                                                                                                                                                                                                                                                                                                                  | CMD                       | <ul> <li>[31:29] - CMD</li> <li>- 000: Store Append Count to memory</li> <li>- 001: Store GDS Data to memory</li> <li>- 010: Store 32-bit "DATA" from this packet to memory</li> <li>- Others: Reserved</li> </ul> |  |
|                                                                                                                                                                                                                                                                                                                                    | ADDRESS_HI                | [7:0] - ADDR_HI, address bits[39:32].External memory address written for a End-of-Shader Done event. Read returns last value written to memory.                                                                    |  |
| 5SIZE REG_ADDR<br>OR DATA[30:16] - SIZE: Number of DWs to read from the GDS. Currently s<br>16KDWs. A value of zero is not supported when CMD = 001.<br>[15:0] - REG_ADDR: Register address of the register to read, not a<br>specific register type sub-space.<br>OR [31:0] - DATA: EOS fence value that will be written to memor |                           | [15:0] - REG_ADDR: Register address of the register to read, not an offset into a                                                                                                                                  |  |

### **EVENT\_WRITE\_EOS Packet Description**

### 9.6.4 <u>MEM\_SEMAPHORE - R7xx/Evergreen/Cayman</u>

The MEM\_SEMAPHORE packet supports Signal and Wait Semaphores. Wait Semaphores are executed at the top of pipe (CP) and a Signal Semaphores are executed at the bottom of pipe (after whatever work before it has been completed).

| 1          | MEM_SEMMI HORE Tacket Description                        |            |                                             |  |  |
|------------|----------------------------------------------------------|------------|---------------------------------------------|--|--|
| DW Field N |                                                          | Field Name | Description                                 |  |  |
|            | 1                                                        | HEADER     | Header of the packet                        |  |  |
|            | 2 ADDRESS_LO [31:3] Lower bits of QWORD-Aligned Address. |            | [31:3] Lower bits of QWORD-Aligned Address. |  |  |

### MEM\_SEMAPHORE Packet Description

| 3 | SEM_SEL        | [31:29] - SEM_SEL - Select either Wait or Signal. This is a multi-bit field to be DW |
|---|----------------|--------------------------------------------------------------------------------------|
|   |                | compatible with EVENT_WRITE_EOP.                                                     |
|   |                | - 110: Signal Semaphore.                                                             |
|   |                | - 111: Wait Semaphore.                                                               |
|   | CLIENT_CODE    | [25:24] - CLIENT_CODE - Client Code                                                  |
|   |                | - 00: CP                                                                             |
|   |                | - 01: CB                                                                             |
|   |                | - 10: DB                                                                             |
|   |                | - 11: SX                                                                             |
|   | SIGNAL_TYPE    | [20] - SIGNAL_TYPE - Signal Type                                                     |
|   |                | - 0: SEM_SEL + Signal Semaphore and signal type is increment, or the SEM_SEL         |
|   |                | + Wait Semaphore                                                                     |
|   |                | - 1: SEM_SEL + Signal Semaphore and signal type is write '1'.                        |
|   | USE_MAILBOX    | [16] USE_MAILBOX0 - Signal Semaphore will not wait for mailbox to be written1 -      |
|   |                | Signal Semaphore will wait for mailbox to be written                                 |
|   | WAIT_ON_SIGNAL | [12] WAIT_ON_SIGNAL - This field should be set in evergreen, but in cayman it is     |
|   |                | reserved and should be set to zero. If set the Wait_Semaphore will wait until all    |
|   |                | outstanding End of Pipe (and therefore Signal_Semaphores) have completed, before     |
|   |                | being issued.                                                                        |
|   |                | - 0: Don't wait for all Signal Semaphores to complete.                               |
|   |                | - 1: Wait for all Signal Semaphores to complete.                                     |
|   | ADDRESS_HI     | [7:0] - ADDRESS_HI - Upper bits (39:32) of Address                                   |

# 9.6.5 <u>PFP\_SYNC\_ME - R7xx/Evergreen/Cayman</u>

This packet is inserted by the driver when it needs the PFP to stall or wait until the ME is at the synced up to the PFP.

### **PFP\_SYNC\_ME** Packet Description

| DW | Field Name | Description          |
|----|------------|----------------------|
| 1  | HEADER     | Header of the packet |
| 2  | DUMMY      | Dummy Data           |

# 9.6.6 <u>STRMOUT BUFFER UPDATE - R7xx/Evergreen/Cayman</u>

### STMOUT\_BUFFER\_UPDATE Packet Description

| DIII | E. 1131                                                            |                           |  |
|------|--------------------------------------------------------------------|---------------------------|--|
| Dw   | Field Name                                                         | Description               |  |
| 1    | HEADER                                                             | Header of the packet      |  |
| 2    | CONTROL                                                            | Bits [31:10] - Reserved   |  |
|      | Bits [9:8] - Buffer Select, indicates the stream out being updated |                           |  |
|      | - 00: Stream out buffer 0                                          |                           |  |
|      | - 01: Stream out buffer 1                                          |                           |  |
|      | - 10: Stream out buffer 2                                          |                           |  |
|      |                                                                    | - 11: Stream out buffer 3 |  |
|      |                                                                    | Bits [7:3] - Reserved     |  |

|   |                                                                                              | Bits [2:1] - Source_Select: to write into VGT_STRMOUT_BUFFER_OFFSET                    |  |  |
|---|----------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------|--|--|
|   | - 00: Use BUFFER_OFFSET in this packet                                                       |                                                                                        |  |  |
|   | - 01: Read VGT_STRMOUT_BUFFER_FILLED_SIZE                                                    |                                                                                        |  |  |
|   | - 10: Read data from SRC_ADDRESS                                                             |                                                                                        |  |  |
|   |                                                                                              | - 11: None                                                                             |  |  |
|   |                                                                                              | Bit [0] - Update_Memory: Store BufferFilledSize to memory                              |  |  |
|   |                                                                                              | - 0: Don't update memory; DST_ADDRESS_LO/HI are don't care, but must be                |  |  |
|   |                                                                                              | provided                                                                               |  |  |
|   |                                                                                              | - 1: Update memory at DST_ADDRESS with                                                 |  |  |
|   |                                                                                              | VGT_STRMOUT_BUFFER_FILLED_SIZE                                                         |  |  |
| 3 | 3 DST_ADDRESS_LO Bits [31:2] - Lower bits of DWord-Aligned Destination Address [31:2]. Valid |                                                                                        |  |  |
|   |                                                                                              | Update_Memory is "1". Bits [1:0] - Swap [1:0] function used for data write.            |  |  |
| 4 | DST_ADDRESS_HI                                                                               | Bits [7:0] - Upper bits of Destination Address [39:32]. DW valid only if Store         |  |  |
|   |                                                                                              | BufferFilledSize is 01".                                                               |  |  |
| 5 | [BUFFER_OFFSET] - or                                                                         | If Source Select = "00", bits[31:0] has the BUFFER_OFFSET[31:0] in DWs to write        |  |  |
|   | - SRC_ADDRESS_LO                                                                             | to VGT_STRMOUT_BUFFER_OFFSET.                                                          |  |  |
|   |                                                                                              | If Source Select = "01", this ordinal is don't care                                    |  |  |
|   |                                                                                              | If Source Select = "10",                                                               |  |  |
|   | - bits [31:2] is "SRC_ADDRESS_LO" (the lower bits of DWord-Aligned Source                    |                                                                                        |  |  |
|   |                                                                                              | Address [31:2]).                                                                       |  |  |
|   |                                                                                              | - bits [1:0] - Swap [1:0] function used for data read. DW valid only if Source _Select |  |  |
|   |                                                                                              | is "10".                                                                               |  |  |
|   |                                                                                              | bits [7:0] - Upper bits of Source Address [39:32]. DW valid only if Source_Select is   |  |  |
| 6 | SRC_ADDRESS_HI                                                                               | "10".                                                                                  |  |  |
| 1 | 1                                                                                            |                                                                                        |  |  |

# 9.6.7 <u>SURFACE\_SYNC - R7xx/Evergreen/Cayman</u>

The SURFACE\_SYNC packet will allow the driver to place the surface sync commands as one atomic packet and to allow the driver to send the same COHER\_CNTL value regardless of the ASIC.

| SURFACE | _SYNC | Packet | Description |
|---------|-------|--------|-------------|
|---------|-------|--------|-------------|

|            | -                                                                                       |  |
|------------|-----------------------------------------------------------------------------------------|--|
| Field Name | Description                                                                             |  |
| HEADER     | Header of the packet                                                                    |  |
|            | [31] - ENGINE: 0=PFP, 1=ME. Perform Surface Synhronization at CP.PFP (so index          |  |
| ENGINE     | DMA requests are not sent to VGT until the surface is coherent) or at the CP.ME as      |  |
|            | done in previous ASICs.                                                                 |  |
| COHER_CNTL | [28:0] - COHER_CNTL: See the CP_COHER_CNTL register for the definition.                 |  |
| COHER_SIZE | Coherency Surface Size has a granularity of 256 Bytes.                                  |  |
|            | CP_COHER_BASE[31:0] + virtual memory address [39:8]. This value times 256 is            |  |
| COHER_BASE | the byte address of the start of the surface to be synchronized (to create the high 32- |  |
|            | bits of a 40-bit virtual device address).                                               |  |
| VMID       | MID [31:24] - VMID[7:0]: Virtual Memory ID to be synchronized. (cayman)                 |  |
| _          | [15:0] - Poll_Interval[15:0]: Interval to wait between the time an unsuccessful polling |  |
|            | result is returned and a new poll is issued. Time between these is 16*Poll_Interval     |  |
|            | clocks. The minimum value is 0x04. A value less than 0x04 will be forced to 0x04.       |  |
|            | HEADER<br>ENGINE<br>COHER_CNTL<br>COHER_SIZE<br>COHER_BASE<br>VMID                      |  |

# 9.6.8 <u>WAIT\_REG\_MEM - R7xx/Evergreen/Cayman</u>

The WAIT\_REG\_MEM packet can be processed by either the CP.PFP or the CP.ME, as indicated by the ENGINE field.

### WAIT\_REG\_MEM Packet Description

| DW | Field Name      | Description                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                   |  |
|----|-----------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|--|
| 1  | HEADER          | Header of the packet                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                          |  |
| 2  | ENGINE          | [8] - ENGINE:<br>- 0=ME ,<br>- 1=PFP<br>[7:5] - Reserved                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                      |  |
|    | MEM_SPACE       | <ul> <li>[4] - MEM_SPACE:</li> <li>- 0=Register,</li> <li>- 1=Memory. If ENGINE = PFP, only Memory is valid.</li> </ul>                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                       |  |
|    |                 | Bits [3] - Reserved                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                           |  |
|    | FUNCTION        | <ul> <li>[2:0] - FUNCTION <ul> <li>000 - Always (Compare Passes). Still does read operation and waits for read results to come back.</li> <li>001 - Less Than (&lt;) the Reference Value.</li> <li>010 - Less Than or Equal (&lt;=) to the Reference Value.</li> <li>011 - Equal (=) to the Reference Value.</li> <li>100 - Not Equal (!=) to theReference Value.</li> <li>100 - Not Equal (!=) to theReference Value.</li> <li>101 - Greater Than or Equal (&gt;=) to the Reference Value.</li> <li>110 - Greater Than (&gt;) the Reference Value.</li> <li>111 - Reserved.</li> </ul> </li> <li>If PFP, only 101/Greater Than or Equal is valid.</li> </ul> |  |
| 3  | POLL_ADDRESS_LO | Lower portion of Address to poll If the address is a memory location then bits [31:2] specify the lower bits of the address and Bits [1:0] specify SWAP used for memory read. If the address is a memory-mapped register, then bits [15:0] is the DWord memory-mapped register address that the CP will read.                                                                                                                                                                                                                                                                                                                                                 |  |
| 4  | POLL_ADDRESS_HI | Higher portion Address to poll If the address is a memory location then bits [7:0] specify bits 39:32 of the address. If the address is a memory-mapped register, then this DW is a don't care.                                                                                                                                                                                                                                                                                                                                                                                                                                                               |  |
| 5  | REFERENCE       | [31:0] - Reference Value.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     |  |
| 6  | MASK            | [31:0] - Mask for Comparison.                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                 |  |
| 7  | POLL_INTERVAL   | [15:0] - Poll_Interval: Interval to wait between the time an unsuccessful polling result is returned and a new poll is issued. Time between these is 16*Poll_Interval clocks. The minimum value is 0x04. A value less than 0x04 will be forced to 0x04.                                                                                                                                                                                                                                                                                                                                                                                                       |  |

# 9.7 Misc Packets

# 9.7.1 <u>MEM WRITE - R7xx/Evergreen/Cayman</u>

The MEM\_WRITE packet provides the opportunity to write two DWords to a QWORD-aligned memory location at the top of the graphics pipe.

| MEM | WRITE                                   | Packet   | Description |
|-----|-----------------------------------------|----------|-------------|
|     | _ , , , , , , , , , , , , , , , , , , , | I uchice | Description |

| DW | Field Name      | Description                                                                     |  |  |
|----|-----------------|---------------------------------------------------------------------------------|--|--|
| 1  | HEADER          | Header of the packet                                                            |  |  |
| 2  | ADDRESS_LO SWAP | [31:3] Lower bits of QWORD-Aligned Address.                                     |  |  |
| -  |                 | Bits [1:0] - Swap function used for data write.                                 |  |  |
|    |                 | [18] - Data32 writes on the lower 32-bits to memory when set to '1'.            |  |  |
|    |                 | [17] - Write Confirm is requested for this write when set.                      |  |  |
|    |                 | [16] - Selects sending                                                          |  |  |
|    |                 | - 0: embedded packet Data                                                       |  |  |
|    | DATA32          | - 1: a copy of the 64 bit counter indicated by the CPCNTR_SEL field             |  |  |
|    | WR_CONFIRM      | [14] - Selects the 64 bit counter to send if bit 16 (CNTR_SEL) is programmed to |  |  |
| 3  | CNTR_SEL        | "1".                                                                            |  |  |
|    | CNTR64_SEL      | - 0: Send the current value of CP's copy of the 64 -bit GPU counter             |  |  |
|    | ADDRESS_HI      | - 1: Send current value of the CP_PERFCOUNTER_HI/LO. The intent is for the      |  |  |
|    |                 | driver to have already selected the always count option for CP_PERFCOUNTER      |  |  |
|    |                 | _SELECT and requested the CP to start counting via the CP_PERFMON_CNTL          |  |  |
|    |                 | register.                                                                       |  |  |
|    |                 | [7:0] - Upper bits (39:32) of Address                                           |  |  |
| 4  | DATA_LO         | [31:0] Data                                                                     |  |  |
| 5  | DATA_HI         | [31:0] Data                                                                     |  |  |

# 9.7.2 <u>NOP - R7xx/Evergreen/Cayman</u>

Skip a number of DWords to get to the next packet.

# MEM\_WRITE Packet Description

| 1 | HEADER       | Header of the packet. |
|---|--------------|-----------------------|
| 2 | {DATA_BLOCK} | DATA_BLOCK            |

This field may consist of a number of DWords, and the content may be anything.