-
Bug
-
Resolution: Unresolved
-
Low
-
Code Generation Tools
-
CODEGEN-9439
-
-
default
-
This issue only applies to users who are manually programming the Matrix Multiply Accelerator (MMA) and are using the __HWA_LOAD_2REG intrinsic.
The Matrix Multiply Accelerator (MMA) paired with the C7120 CPU allows the user to send values into bias, scale, and shift registers within the MMA that affect the operation of the MMA.
The MMA will issue a hardware exception when more than one load of each of a bias, scale, or shift register pair is issued in a 24-cycle period.
A programmer who wants to load a value into the bias, scale or shift registers will use the __HWA_LOAD_2REG intrinsics in C/C++ code. The use of this intrinsic results in an HWAOPEN instruction with a special immediate operand (0x8, 0x9, 0xa, or 0xb) in the compiler-generated assembly.
The C7000 compiler does not ensure that any two loads to the same MMA register pair do not execute within 24 cycles. Therefore, if the source code has two loads to the same MMA register pair, the compiler may produce code that results in the exception described above. This could also occur if a single load to an MMA register appears in a loop.
There are no plans to address this issue in the compiler.
The MMALIB software package that is delivered with the PSDK is tested to ensure that this condition does not occur.
Potential workaround:
The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
__asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");
This technique may have undesirable performance effects.