Compiler fails to add extra delay slot between DIVF32 at end of function and 'MOV32 ACC,R0H' at the start of a called function

XMLWordPrintable

    • Type: Bug
    • Resolution: Fixed
    • Priority: Medium
    • Code Generation Tools
    • CODEGEN-15061
    • Show
      https://e2e.ti.com/support/microcontrollers/c2000-microcontrollers-group/c2000/f/c2000-microcontrollers-forum/1609337/tms320f28p650dk-float-division-with-subsequent-conversion-to-integer-gives-wrong-result
    • Hide
      C2000_16.9.0.LTS
      C2000_18.1.0.LTS
      C2000_22.6.0.LTS
      C2000_15.12.0.LTS
      C2000_21.6.0.LTS
      C2000_18.12.0.LTS
      C2000_20.2.0.LTS
      Show
      C2000_16.9.0.LTS C2000_18.1.0.LTS C2000_22.6.0.LTS C2000_15.12.0.LTS C2000_21.6.0.LTS C2000_18.12.0.LTS C2000_20.2.0.LTS
    • Hide
      C2000_NEXT*
      C2000_25.11.1.LTS*
      C2000_22.6.4.LTS*
      Show
      C2000_NEXT* C2000_25.11.1.LTS* C2000_22.6.4.LTS*
    • default
    • Hide
      Avoid the issue by:

      Either not using TMU instructions in the calling function:
         --tmu_support=none

      Or keep the generated assembly (-k) and use the assembly instead of C code
      and manually add a NOP after the 5p TMU instruction.
      Show
      Avoid the issue by: Either not using TMU instructions in the calling function:    --tmu_support=none Or keep the generated assembly (-k) and use the assembly instead of C code and manually add a NOP after the 5p TMU instruction.

      For below sequence of instructions, the C28 compiler fails to add an extra delay
      slot required for instruction 'MOV32 ACC,R0H' when it is placed at the start of 
      a function. Also, the calling function must end with a 5p instruction (eg DIVF32) 
      that already requires all 4 cycles of the subsequent LCR call. 

      DIVF32  R0H,R1H,R0H   ; 5p TMU instr requires 4 delay slot cycles
      LCR     c28xabi_ftoll ; 4 cycle branch
      — above branches to below
      MOV32   @ACC, R0H     ; FPU to CPU requires inserting a prior NOP

      Per TRM spruhs1c Section 1.4.2, above MOV32 from fpu register to c28x register
      requires an extra delay slot prior to the MOV32 instruction.

      Above issue is avoided by inserting a NOP (or another legal instruction)
      before the LCR instruction, or before the MOV32 instruction.

      This issue could impact any of below 5p TMU instructions (that require 4 delay
      slots) if the instruction is the last one before an LCR (or LC) call to a
      function that starts with a MOV32 from fpu register to c28x register:
       DIVF32
       SQRTF32
       QUADF32

      Details of original test case:
      These commands show a source file and build it.

      C:\examples>type file.c
      #include <stdint.h>
      
      int64_t fxn()
      {
         float x = -10.0;
         return x / 0.1f;
      }
      
      C:\examples>cl2000 -v28 -ml -mt --cla_support=cla2 --float_support=fpu32 --tmu_support=tmu1 --vcu_support=vcrc -Ooff --fp_mode=relaxed --fp_reassoc=on --abi=eabi --src_interlist file.c
      

      Here are 2 key lines from the compiler generated file.asm.

              DIVF32    R0H,R1H,R0H           ; [CPU_FPU] |6| 
              LCR       #||__c28xabi_ftoll||  ; [CPU_ALU] |6| 
      

      __c28xabi_ftoll is a function from the compiler RTS library rts2800_fpu32_eabi.lib. Here is the beginning of the disassembly of this function.

      00000000        __c28xabi_ftoll:
      00000000   bfa9   MOV32        ACC, R0H
      

      MOV32 reads R0H, the same register written by DIVF32. Not enough cycles have executed before the read of R0H.

            Assignee:
            TI User
            Reporter:
            TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved:

                Connection: Intermediate to External PROD System
                EXTSYNC-6380 - Compiler fails to add extra delay s...
                SYNCHRONIZED
                • Last Sync Date: