Sunday, April 22, 2012

TLPs with Data Payloads - Rules

Length is specified as an integral number of DW
Length[9:0] is reserved for all Messages except those which explicitly refer to a Data Length
• Refer to the Message Code tables in Section 2.2.8.
The Transmitter of a TLP with a data payload must not allow the data payload length as given
by the TLP’s Length [ ] field to exceed the length specified by the value in the
Max_Payload_Size field of the Transmitter’s Device Control register taken as an integral number
of DW (see Section 7.8.4).
• For ARI Devices, the Max_Payload_Size is determined solely by the setting in Function 0.
The Max_Payload_Size settings in other Functions are ignored.
• For an Upstream Port associated with a non-ARI multi-Function device whose
Max_Payload_Size settings are identical across all Functions, a transmitted TLP’s data
payload must not exceed the common Max_Payload_Size setting.
• For an Upstream Port associated with a non-ARI multi-Function device whose
Max_Payload_Size settings are not identical across all Functions, a transmitted TLP’s data
payload must not exceed a Max_Payload_Size setting whose determination is
implementation specific.
♦ Transmitter implementations are encouraged to use the Max_Payload_Size setting from
the Function that generated the transaction, or else the smallest Max_Payload_Size
setting across all Functions.
♦ Software should not set the Max_Payload_Size in different Functions to different values
unless software is aware of the specific implementation.
• Note: Max_Payload_Size applies only to TLPs with data payloads; Memory Read Requests
are not restricted in length by Max_Payload_Size. The size of the Memory Read Request is
controlled by the Length field
The size of the data payload of a Received TLP as given by the TLP’s Length [ ] field must not
exceed the length specified by the value in the Max_Payload_Size field of the Receiver’s Device
Control register taken as an integral number of DW (see Section 7.8.4).
• Receivers must check for violations of this rule. If a Receiver determines that a TLP violates
this rule, the TLP is a Malformed TLP
♦ This is a reported error associated with the Receiving Port (see Section 6.2)
• For ARI Devices, the Max_Payload_Size is determined solely by the setting in Function 0.20 The Max_Payload_Size settings in other Functions are ignored.
• For an Upstream Port associated with a non-ARI multi-Function device whose
Max_Payload_Size settings are identical across all Functions, the Receiver is required to
check the TLP’s data payload size against the common Max_Payload_Size setting.
• For an Upstream Port associated with a non-ARI multi-Function device whose
Max_Payload_Size settings are not identical across all Functions, the Receiver is required to
check the TLP’s data payload against a Max_Payload_Size setting whose determination is
implementation specific.
♦ Receiver implementations are encouraged to use the Max_Payload_Size setting from the
Function targeted by the transaction, or else the largest Max_Payload_Size setting across all Functions.
♦ Software should not set the Max_Payload_Size in different Functions to different values
unless software is aware of the specific implementation.
For TLPs, that include data, the value in the Length field and the actual amount of data included
in the TLP must match.
• Receivers must check for violations of this rule. If a Receiver determines that a TLP violates
this rule, the TLP is a Malformed TLP
♦ This is a reported error associated with the Receiving Port (see Section 6.2)
The value in the Length field applies only to data – the TLP Digest is not included in the Length
When a data payload is included in a TLP other than an AtomicOp Request or an AtomicOp
Completion, the first byte of data following the header corresponds to the byte address closest
to zero and the succeeding bytes are in increasing byte address sequence.
• Example: For a 16-byte write to location 100h, the first byte following the header would be
the byte to be written to location 100h, and the second byte would be written to location
101h, and so on, with the final byte written to location 10Fh.
The data payload in AtomicOp Requests and AtomicOp Completions must be formatted such
that the first byte of data following the TLP header is the least significant byte of the first data
value, and subsequent bytes of data are strictly increasing in significance. With CAS Requests,
the second data value immediately follows the first data value, and must be in the same format.
• The endian format used by AtomicOp Completers to read and write data at the target
location is implementation specific, and is permitted to be whatever the Completer
determines is appropriate for the target memory (e.g., little endian, big endian, etc). Endian
format capability reporting and controls for AtomicOp Completers are outside the scope of
this specification.
• Little endian example: For a 64-bit (8-byte) Swap Request targeting location 100h with the
target memory in little endian format, the first byte following the header is written to
location 100h, the second byte is written to location 101h, and so on, with the final byte
written to location 107h. Note that before performing the writes, the Completer first reads
the target memory locations so it can return the original value in the Completion. The byte
address correspondence to the data in the Completion is identical to that in the Request.
• Big endian example: For a 64-bit (8-byte) Swap Request targeting location 100h with the
target memory in big endian format, the first byte following the header is written to location
107h, the second byte is written to location 106h, and so on, with the final byte written to
location 100h. Note that before performing the writes, the Completer first reads the target
memory locations so it can return the original value in the Completion. The byte address
correspondence to the data in the Completion is identical to that in the Request.
• Figure 2-6 shows little endian and big endian examples of Completer target memory access
for a 64-bit (8-byte) FetchAdd. The bytes in the operands and results are numbered 0-7,
with byte 0 being least significant and byte 7 being most significant. In each case, the
Completer fetches the target memory operand using the appropriate endian format. Next,
AtomicOp compute logic in the Completer performs the FetchAdd operation using the
original target memory value and the “add” value from the FetchAdd Request. Finally, the
Completer stores the FetchAdd result back to target memory using the same endian format
used for the fetch.
A-0742
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
AtomicOp
compute
logic
target memory
locations
FetchAdd example with target
memory in little endian format
107h
106h
105h
104h
103h
102h
101h
100h
"add" value
FetchAdd result
original value
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
7 6 5 4 3 2 1 0
AtomicOp
compute
logic
target memory
locations
FetchAdd example with target
memory in big endian format
107h
106h
105h
104h
103h
102h
101h
100h
"add" value
FetchAdd result
original value
Figure 2-6: Examples of Completer Target Memory Access for FetchAdd
IMPLEMENTATION NOTE
Endian Format Support by RC AtomicOp Completers
One key reason for permitting an AtomicOp Completer to access target memory using an endian
format of its choice is so that PCI Express devices targeting host memory with AtomicOps can
interoperate with host software that uses atomic operation instructions (or instruction sequences).
Some host environments have limited endian format support with atomic operations, and by
supporting the “right” endian format(s), an RC AtomicOp Completer may significantly improve
interoperability.
For an RC with AtomicOp Completer capability on a platform supporting little-endian-only
processors, there is little envisioned benefit for the RC AtomicOp Completer to support any endian
format other than little endian. For an RC with AtomicOp Completer capability on a platform
supporting bi-endian processors, there may be benefit in supporting both big endian and little
endian formats, and perhaps having the endian format configurable for different regions of host
memory.
There is no PCI Express requirement that an RC AtomicOp Completer support the host processor’s
“native” format (if there is one), nor is there necessarily significant benefit to doing so. For
example, some processors can use load-link/store-conditional or similar instruction sequences to do
atomic operations in non-native endian formats and thus not need the RC AtomicOp Completer to
support alternative endian formats.
IMPLEMENTATION NOTE
Maintaining Alignment in Data Payloads
Section 2.3.1.1 discusses rules for forming Read Completions respecting certain natural address
boundaries. Memory Write performance can be significantly improved by respecting similar address
boundaries in the formation of the Write Request. Specifically, forming Write Requests such that
natural address boundaries of 64 or 128 bytes are respected will help to improve system
performance.

No comments:

Post a Comment