[EXT_EP-10662] C7000 compiler doesn't enforce rate-limit of MMA bias/scale/shift register loading - Software Issue Report (SIR)

Type: Bug
Resolution: Unresolved
Priority: Low

Product:
Code Generation Tools
Internal ID:
CODEGEN-9439
Found In Release:

Hide
C7000_2.1.2.LTS
C7000_2.0.1.STS
C7000_2.1.0.LTS
C7000_5.0.0.LTS
C7000_2.0.0.STS
C7000_3.0.0.STS
C7000_3.1.0.LTS
C7000_6.0.0.LTS*
C7000_4.0.0.STS
C7000_4.1.0.LTS

Show
C7000_2.1.2.LTS C7000_2.0.1.STS C7000_2.1.0.LTS C7000_5.0.0.LTS C7000_2.0.0.STS C7000_3.0.0.STS C7000_3.1.0.LTS C7000_6.0.0.LTS* C7000_4.0.0.STS C7000_4.1.0.LTS
Affected Platform/Device:
default
Workaround:

Hide
The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:

__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

This technique may have undesirable performance effects.

Show
The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair: __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) "); __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)"); __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)"); This technique may have undesirable performance effects.

This issue only applies to users who are manually programming the Matrix Multiply Accelerator (MMA) and are using the __HWA_LOAD_2REG intrinsic.

The Matrix Multiply Accelerator (MMA) paired with the C7120 CPU allows the user to send values into bias, scale, and shift registers within the MMA that affect the operation of the MMA.

The MMA will issue a hardware exception when more than one load of each of a bias, scale, or shift register pair is issued in a 24-cycle period.

A programmer who wants to load a value into the bias, scale or shift registers will use the __HWA_LOAD_2REG intrinsics in C/C++ code. The use of this intrinsic results in an HWAOPEN instruction with a special immediate operand (0x8, 0x9, 0xa, or 0xb) in the compiler-generated assembly.

The C7000 compiler does not ensure that any two loads to the same MMA register pair do not execute within 24 cycles. Therefore, if the source code has two loads to the same MMA register pair, the compiler may produce code that results in the exception described above. This could also occur if a single load to an MMA register appears in a loop.

There are no plans to address this issue in the compiler.

The MMALIB software package that is delivered with the PSDK is tested to ensure that this condition does not occur.

Potential workaround:

The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:

__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

This technique may have undesirable performance effects.

Assignee:: TI User

Reporter:: TI User

Votes:: 0 Vote for this issue

Watchers:: 2 Start watching this issue

Created:: 07/Dec/21 12:48 PM

Updated:: 08/Jan/25 2:00 PM

Details

Description

Attachments

Activity

People

Dates