Uploaded image for project: 'Embedded Software & Tools'
  1. Embedded Software & Tools
  2. EXT_EP-10662

C7000 compiler doesn't enforce rate-limit of MMA bias/scale/shift register loading

XMLWordPrintable

    • Icon: Bug Bug
    • Resolution: Unresolved
    • Icon: Low Low
    • Code Generation Tools
    • CODEGEN-9439
    • Hide
      C7000_2.1.2.LTS
      C7000_2.0.1.STS
      C7000_5.0.0.STS*
      C7000_2.1.0.LTS
      C7000_2.0.0.STS
      C7000_3.0.0.STS
      C7000_3.1.0.LTS
      C7000_4.0.0.STS
      C7000_4.1.0.LTS
      Show
      C7000_2.1.2.LTS C7000_2.0.1.STS C7000_5.0.0.STS* C7000_2.1.0.LTS C7000_2.0.0.STS C7000_3.0.0.STS C7000_3.1.0.LTS C7000_4.0.0.STS C7000_4.1.0.LTS
    • default
    • Hide
      The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:
        
      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

      This technique may have undesirable performance effects.
      Show
      The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:    __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) "); __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)"); __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)"); This technique may have undesirable performance effects.

      This issue only applies to users who are manually programming the Matrix Multiply Accelerator (MMA) and are using the __HWA_LOAD_2REG intrinsic.

      The Matrix Multiply Accelerator (MMA) paired with the C7120 CPU allows the user to send values into bias, scale, and shift registers within the MMA that affect the operation of the MMA.

      The MMA will issue a hardware exception when more than one load of each of a bias, scale, or shift register pair is issued in a 24-cycle period.

      A programmer who wants to load a value into the bias, scale or shift registers will use the __HWA_LOAD_2REG intrinsics in C/C++ code. The use of this intrinsic results in an HWAOPEN instruction with a special immediate operand (0x8, 0x9, 0xa, or 0xb) in the compiler-generated assembly.

      The C7000 compiler does not ensure that any two loads to the same MMA register pair do not execute within 24 cycles. Therefore, if the source code has two loads to the same MMA register pair, the compiler may produce code that results in the exception described above. This could also occur if a single load to an MMA register appears in a loop.

      There are no plans to address this issue in the compiler.

      The MMALIB software package that is delivered with the PSDK is tested to ensure that this condition does not occur.

      Potential workaround:

      The programmer can ensure that 24 cycles elapse in-between two loads to the same MMA register pair by placing the following C code in-between loads of the same MMA bias/scale/shift register pair:

      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (8) ");
      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (16)");
      __asm(" NOP 0x8 ; rate-limit MMA load bias/scale/shift pairs (24)");

      This technique may have undesirable performance effects.

            syncuser TI User
            syncuser TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated: