Uploaded image for project: 'Embedded Software & Tools'
  1. Embedded Software & Tools
  2. EXT_EP-10824

Using memset on memory location that is 8-byte aligned inefficiently writes to memory only 8-bits at a time

XMLWordPrintable

    • Icon: Enhancement Enhancement
    • Resolution: Won't Implement
    • Icon: Not Prioritized Not Prioritized

      The attached files form a simple program that calls memset ...

      void fxn(long long *ptr, int length)
      {
         memset(ptr, 0, length);
      }
      

      Build it, then produce the disassembly with the C code interlisted ...

      % tiarmclang @options.txt -Oz  main.c file.c -Wl,-c -o main.out -Wl,-m=main.map
      
      % tiarmobjdump -dS main.out > main_dis.txt
      tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\boot_cortex_m.c
      tiarmobjdump.exe: warning: 'main.out': failed to find source E:/cvs/jenkins/workspace/BuildAndValidate_Worker/llvm_cgt/llvm-project/compiler-rt/lib/builtins/arm\aeabi_memset.S
      tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\pre_init.c
      tiarmobjdump.exe: warning: 'main.out': failed to find source e:\cvs\jenkins\workspace\buildandvalidate_worker\llvm_cgt\arm-llvm\release\libc\src\exit.c
      

      Inspect the disassembly to see the loop that sets memory to all zeros is ...

      00000062 <_loop>:
            62: 9a 42        	cmp	r2, r3
            64: 08 bf        	it	eq
            66: 70 47        	bxeq	lr
            68: c1 54        	strb	r1, [r0, r3]
            6a: 01 33        	adds	r3, #1
            6c: f9 e7        	b	0x62 <_loop>            @ imm = #-14
      

      This only writes 8-bits at a time.

      See the related forum thread for an alternate implementation of memset that writes 64-bits at a time. The customer's code uses memset to initialize 64MB of memory. Using the faster memset takes about 0.5 seconds, vs about 14 seconds for the slower memset.

            syncuser TI User
            syncuser TI User
            Votes:
            0 Vote for this issue
            Watchers:
            2 Start watching this issue

              Created:
              Updated:
              Resolved: