[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: updated P4 timings

Greetings!  Looks like the new binutils work.  Just a straightforward
(dumb) port of the SSE1 sgemm to an SSE2 dgemm gives the following on
torc19 as a function of nb:

4 483.36
8 937.42
12 1214.86
16 1573.87
20 1752.28
24 1857.47
28 1933.42
32 1895.4
36 1752.19
40 2061.65
44 1903.98
48 2061.43
52 2019.29
56 2158.68
60 2062.08
64 2160.4
68 1891.99
72 2158.44
76 2109.31
80 2159.82
84 1959.44

Take care,

R Clint Whaley <[email protected]> writes:

> Camm,
> >Greetings!  Just an update on the P4 SSE2.  Downloaded the intel specs
> >today, and it seems as though all the instructions are the same with
> >the trailing 's' replaced by a 'd', i.e. addps -> addpd, etc.  Anyway,
> >it seems as though the assembler has not yet caugt up with this:
> >Guess we have to wait for a new assembler update.
> First, thanks for scoping this out.  Second, I went to the binutils directory
> on www.gnu.org, and found some comments indicating the newest stuff has
> support for SSE2.  However, I couldn't figure out much more than that.  So,
> I grabbed last night's snapshot, and installed it on torc19.  If you put
> /home/rwhaley/local/P4/bin as the first entry in your path, I think gcc
> will use the one I installed.  Can you see if that guy will compile your
> routine?  If not, maybe you can post a very simple SSE2 file, so we can
> iterate until we get an assembler and/or gcc that can handle the SSE2
> stuff, without having to have you test each one . . .
> Thanks,
> Clint

Camm Maguire			     			[email protected]
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah