Building with LTO results in two 16-bit thumb NOP instructions to be issued in 32-bit Arm mode

XMLWordPrintable

    • Type: Bug
    • Resolution: Unresolved
    • Priority: Medium

      Extract the attached test case to an empty directory. These commands link it, then create the disassembly.

      % tiarmclang @options.txt
      % tiarmobjdump --demangle --disassemble adc5_ippu.elf > dis.txt
      

      Inspect the disassembly to see the following ...

      b6e02e20 <std::__1::__next_prime(unsigned int)>:
      b6e02e20: e92d4ff0     	push	{r4, r5, r6, r7, r8, r9, r10, r11, lr}
      b6e02e24: e24dd004     	sub	sp, sp, #4
      b6e02e28: e35000d3     	cmp	r0, #211
      b6e02e2c: 8a000011     	bhi	0xb6e02e78 <std::__1::__next_prime(unsigned int)+0x58> @ imm = #68
      b6e02e30: e3091c34     	movw	r1, #39988
      b6e02e34: e3a07030     	mov	r7, #48
      b6e02e38: e34a1f42     	movt	r1, #44866
      b6e02e3c: bf00bf00     	svclt	#48896
      

      Notice the svclt instruction, which is obviously incorrect. It appears in the disassembly 11 times. Apparently, a nop instruction is intended, but a svclt is there instead.

      Who is impacted
      ===========
      C++ projects that include code built in thumb2 mode that also rely on the compiler runtime C++ libraries, including libc++.a, and enable Link-Time Optimization (LTO) via -flto on the link-step are impacted.
       
      What's happening
      ============
      In this case, the compiler inserts NOPs when building code for the Cortex-R5 to better align branch targets, which improves instruction fetch buffer performance.  However, when a project is built with Link-Time Optimization (LTO) enabled (using -flto at the link-step), and the project is also linked with code that is built for Thumb2, these inserted NOP instructions are encoded as two, 16bit thumb2 NOP instructions, which are decoded as a 32bit predicated SVC instruction ("svclt #48896"). This instruction is incorrect and can cause problems during program execution.  
       
      This only impacts C++ projects that include that rely on the compiler runtime C++ libraries, including libc++.a, which is built in 32-bit Arm mode.  __  
       
      This problem is not observed on v4.0.x.LTS versions of the compiler in NOP branch-target padding instructions. Although the same two 16bit thumb2 NOP instructions may be seen at the end of code sections to pad the section length (i.e. after the function return), this code is not executed and is benign.
       
      Workaround
      ========
      If your project relies on C++, you may be impacted if you are also using LTO or build all or part of your project in thumb mode.  Suggested workaround is to either disable LTO at the link-step (i.e. don't use -flto) or build your entire project in 32-bit Arm mode. You may also move to the 4.x.LTS release stream or newer release.

            Assignee:
            TI User
            Reporter:
            TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Connection: Intermediate to External PROD System
                EXTSYNC-5126 - Building with LTO results in two 16...
                SYNCHRONIZED
                • Last Sync Date: