Compiler misses opportunity to use VMLA.F32 and VMLS.F32 instructions

XMLWordPrintable

    • Type: Enhancement
    • Resolution: Unresolved
    • Priority: Medium
    • Code Generation Tools
    • CODEGEN-14589
    • ARMCLANG_4.0.4.LTS
    • ARMCLANG_NEXT
    • default

      The attached code contains these lines ...

          biquad->output = biquad->state.coeffs.k[0] * biquad->input +
                           biquad->state.coeffs.k[1] * biquad->state.x_n1 +
                           biquad->state.coeffs.k[2] * biquad->state.x_n2 -
                           biquad->state.coeffs.j[1] * biquad->state.y_n1 -
                           biquad->state.coeffs.j[2] * biquad->state.y_n2;
      

      Build it ...

      tiarmclang @options.txt file.c
      

      Inspect the resulting assembly in file.s. The key lines are ...

              vldr    s0, [r0, #8]
              vldr    s8, [r0, #24]
              vldr    s10, [r0, #28]
              vldr    s12, [r0]
              vldr    s2, [r0, #12]
              vldr    s4, [r0, #16]
              vldr    s6, [r0, #20]
              vmul.f32        s10, s0, s10
              vldr    s14, [r0, #32]
              vmul.f32        s8, s12, s8
              vldr    s1, [r0, #40]
              vldr    s3, [r0, #44]
              vadd.f32        s8, s10, s8
              vstr    s12, [r0, #8]
              vmul.f32        s2, s2, s14
              vstr    s0, [r0, #12]
              vmul.f32        s14, s4, s1
              vstr    s4, [r0, #20]
              vmul.f32        s6, s6, s3
              vadd.f32        s2, s8, s2
              vadd.f32        s6, s6, s14
              vsub.f32        s2, s2, s6
              vstr    s2, [r0, #4]
              vstr    s2, [r0, #16]
      

      The customer's issue is that it could be faster and smaller. The vmul.f32 and vadd.f32 instructions could be combined into vmla.f32. The vmul.f32 and vsub.f32 instructions could be combined into vmls.f32. This example from godbolt shows output from an ARM GCC compiler which does that.

            Assignee:
            TI User
            Reporter:
            TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:

                Connection: Intermediate to External PROD System
                EXTSYNC-6007 - Compiler misses opportunity to use ...
                SYNCHRONIZED
                • Last Sync Date: