-
Type:
Enhancement
-
Resolution: Unresolved
-
Priority:
Medium
-
Code Generation Tools
-
CODEGEN-14589
-
ARMCLANG_4.0.4.LTS
-
ARMCLANG_NEXT
-
default
The attached code contains these lines ...
biquad->output = biquad->state.coeffs.k[0] * biquad->input +
biquad->state.coeffs.k[1] * biquad->state.x_n1 +
biquad->state.coeffs.k[2] * biquad->state.x_n2 -
biquad->state.coeffs.j[1] * biquad->state.y_n1 -
biquad->state.coeffs.j[2] * biquad->state.y_n2;
Build it ...
tiarmclang @options.txt file.c
Inspect the resulting assembly in file.s. The key lines are ...
vldr s0, [r0, #8]
vldr s8, [r0, #24]
vldr s10, [r0, #28]
vldr s12, [r0]
vldr s2, [r0, #12]
vldr s4, [r0, #16]
vldr s6, [r0, #20]
vmul.f32 s10, s0, s10
vldr s14, [r0, #32]
vmul.f32 s8, s12, s8
vldr s1, [r0, #40]
vldr s3, [r0, #44]
vadd.f32 s8, s10, s8
vstr s12, [r0, #8]
vmul.f32 s2, s2, s14
vstr s0, [r0, #12]
vmul.f32 s14, s4, s1
vstr s4, [r0, #20]
vmul.f32 s6, s6, s3
vadd.f32 s2, s8, s2
vadd.f32 s6, s6, s14
vsub.f32 s2, s2, s6
vstr s2, [r0, #4]
vstr s2, [r0, #16]
The customer's issue is that it could be faster and smaller. The vmul.f32 and vadd.f32 instructions could be combined into vmla.f32. The vmul.f32 and vsub.f32 instructions could be combined into vmls.f32. This example from godbolt shows output from an ARM GCC compiler which does that.