[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

single precision on Sparcs



Hi Clint,

  >>  From [email protected] Tue Jun 26 15:24:54 2001
...
  >>  QUESTION 2:
  >>     Does anyone know why gcc produces *much* slower code for single precision
  >>     than double precision on UltraSparcs (see below for more detail)?
  >>  
...
  >>  We use the same C code for both single and double precision.  I would expect
  >>  performance to be similar (though you might need to vary NB), but what I
  >>  see is that double precision runs roughly 25% faster.  Obviously, if one

I'm afraid I cant be a lot of help here: since f.p.  is the same speed
for double on the UltraSparcs I never bothered with writing special
codes single precision.  Some years ago, I did the same thing with my
own codes, changed double to single in a code written for double, and it
did indeed run slower. 

The only thing I can say is that I dont think its gcc's fault: I
inspected the assembler output for my code and I could not fault it.

  >>  is to be faster, it should be single.  The only thing I can think of is that
  >>  it has something to do with the load instruction used; I know double precision
  >>  performance takes a beating if you don't assume it is 8-byte aligned.  Perhaps

this is because without this assumption, you will be doing 2 4-byte
loads for each double, so it doubles the total number of loads (and
stores), and slows you down for that reason. 

There is nothing in the UltraSparc specifications that I know of that
suggest that a single floating point load is slower than a double
floating point load.  The only thing I can think of is that for single,
with the same degree of unrolling, you might be increasing your chances
of direct-mapped conflicts in the top-level cache: still that would not
seem to be able to account for so large a difference. 

  >>  this is the problem with single precision?  Does anyone know of any reason
  >>  for single to be slower than double on UltraSparcs, in particular with gcc?
  >> 

I think that to get good performance for single, you need to to unroll
more than for double and make full use of the register file.  Another
trick which I experimented with on the V7 and V8 sparcs (long ago :)
was to use a load double instruction to load two consecutive singles.
You had to assume that the leading dimension was even, but it did make
a difference, at least on the V7 sparcs. 


Regards, Peter