[EXT_EP-10662] C7000 compiler doesn't enforce rate-limit of MMA bias/scale/shift register loading Created: 07/Dec/21  Updated: 08/Jan/25

Status: New
Project: Embedded Software & Tools
Component/s: None
Affects Version/s: None
Fix Version/s: None

Type: Bug Priority: Low
Reporter: TI User Assignee: TI User
Resolution: Unresolved Votes: 0
Remaining Estimate: Not Specified
Time Spent: Not Specified
Original Estimate: Not Specified

Product: Code Generation Tools
Internal ID: CODEGEN-9439
Found In Release: C7000_2.1.2.LTS
C7000_2.0.1.STS
C7000_2.1.0.LTS
C7000_5.0.0.LTS
C7000_2.0.0.STS
C7000_3.0.0.STS
C7000_3.1.0.LTS
C7000_6.0.0.LTS*
C7000_4.0.0.STS
C7000_4.1.0.LTS
Affected Platform/Device: default
Workaround: The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:
  
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

This technique may have undesirable performance effects.

 Description   

This issue only applies to users who are manually programming the Matrix Multiply Accelerator (MMA) and are using the __HWA_LOAD_2REG intrinsic.

The Matrix Multiply Accelerator (MMA) paired with the C7120 CPU allows the user to send values into bias, scale, and shift registers within the MMA that affect the operation of the MMA.

The MMA will issue a hardware exception when more than one load of each of a bias, scale, or shift register pair is issued in a 24-cycle period.

A programmer who wants to load a value into the bias, scale or shift registers will use the __HWA_LOAD_2REG intrinsics in C/C++ code. The use of this intrinsic results in an HWAOPEN instruction with a special immediate operand (0x8, 0x9, 0xa, or 0xb) in the compiler-generated assembly.

The C7000 compiler does not ensure that any two loads to the same MMA register pair do not execute within 24 cycles. Therefore, if the source code has two loads to the same MMA register pair, the compiler may produce code that results in the exception described above. This could also occur if a single load to an MMA register appears in a loop.

There are no plans to address this issue in the compiler.

The MMALIB software package that is delivered with the PSDK is tested to ensure that this condition does not occur.

Potential workaround:

The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:

__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

This technique may have undesirable performance effects.


Generated at Thu Apr 17 18:39:13 CDT 2025 using Jira 9.12.17#9120017-sha1:aba4002bcd633f188b6a4bb5dd8a0e1f20b79ee4.