-
Enhancement
-
Resolution: Won't Implement
-
Not Prioritized
-
Code Generation Tools
-
CODEGEN-9986
-
ARMCLANG_1.3.1.LTS
-
default
-
There already exists means to override memset, but a variant is automatically chosen based on optimization level.
The attached files form a simple program that calls memset ...
void fxn(long long *ptr, int length) { memset(ptr, 0, length); }
Build it, then produce the disassembly with the C code interlisted ...
% tiarmclang @options.txt -Oz main.c file.c -Wl,-c -o main.out -Wl,-m=main.map % tiarmobjdump -dS main.out > main_dis.txt tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\boot_cortex_m.c tiarmobjdump.exe: warning: 'main.out': failed to find source E:/cvs/jenkins/workspace/BuildAndValidate_Worker/llvm_cgt/llvm-project/compiler-rt/lib/builtins/arm\aeabi_memset.S tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\pre_init.c tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\exit.c
Inspect the disassembly to see the loop that sets memory to all zeros is ...
00000062 <_loop>: 62: 9a 42 cmp r2, r3 64: 08 bf it eq 66: 70 47 bxeq lr 68: c1 54 strb r1, [r0, r3] 6a: 01 33 adds r3, #1 6c: f9 e7 b 0x62 <_loop> @ imm = #-14
This only writes 8-bits at a time.
See the related forum thread for an alternate implementation of memset that writes 64-bits at a time. The customer's code uses memset to initialize 64MB of memory. Using the faster memset takes about 0.5 seconds, vs about 14 seconds for the slower memset.