-
Enhancement
-
Resolution: Unresolved
-
Low
-
Code Generation Tools
-
CODEGEN-8380
-
C6000_8.3.8
-
default
Build the attached file ...
% cl6x -mv6600 -o3 -s -mw sample.cpp
Inspect the resulting sample.asm. It contains one software pipelined loop. Here is the singled scheduled iteration from the loop comment block ...
;* SINGLE SCHEDULED ITERATION ;* ;* $C$C120: ;* 0 LDNDW .D1T1 *A16++(8),A5:A4 ; [A_D64P] |70| ;* 1 NOP 4 ; [A_L66] ;* 5 DMPYU4 .M1 A5:A4,A9:A8,A7:A6:A5:A4 ; [A_M66] |70| ^ ;* 6 NOP 3 ; [A_L66] ;* 9 STDW .D2T1 A7:A6,*SP(40) ; [B_D64P] |94| ;* 10 DADD .L2X 0,A5:A4,B5:B4 ; [B_L66] |70| ^ Define a twin register ;* || STDW .D2T1 A5:A4,*SP(16) ; [B_D64P] |102| ;* 11 STDW .D2T1 A7:A6,*SP(24) ; [B_D64P] |102| ;* 12 NOP 1 ; [A_L66] ;* 13 STDW .D2T2 B5:B4,*SP(32) ; [B_D64P] |94| ;* 14 STNDW .D1T2 B5:B4,*A3++(8) ; [A_D64P] |91| ;* || SPBR $C$C120 ; [] ;* 15 NOP 3 ; [A_L66] ;* 18 ; BRANCHCC OCCURS {$C$C120} ; [] |144|
Those writes to *SP(offset) are not needed. They are never read. Here is one way to see it ...
% findstr /c:"*SP(40)" sample.asm ;* 9 STDW .D2T1 A7:A6,*SP(40) ; [B_D64P] |94| STDW .D2T1 A7:A6,*SP(40) ; [B_D64P] |94| (P) <0,9>
The same is true of all the other writes to *SP(offset). Without those writes, the loop would schedule at a lower ii, and thus perform better.