[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: altivec



>Peter,
>
>
>>I am not impressed byt the 2 GFlop results. It might be all we get, but
>>the Altivec should have a peak of 8 flops per clock cycle  (4 muladds per
>>cycle), so it should be doing better.
>
>The main thing I wonder about is why java mode slows down the computation.
>All it should do from my reading is add an extra stage to the pipeline.
>Since we are using enough registers to handle the longer pipeline, it is
>a real mystery to me why adding the extra stage drops us from 2.2
>Gflop to 2.  Makes me wonder if that extra stage is not pipelined . . .

The extra stage is definitely pipelined.   A common issue in Altivec 
code is that one runs out of rename registers; the 7400/7410 has 6 
vector rename registers, meaning that only six vector operations can 
be scheduled to be in flight at any given time; the seventh and later 
will stall until a rename register is freed up.  Looking at my own 
matmul kernel code, I don't see any place where this could happen 
other than at the end of the row loop (where there are vec_madd 
operations in flight, and vector permute operations start up to do 
the 4x4 transpose.)

>
>If we weren't memory bound, I thought I'd be able to get some speedup by
>using the normal FPU along with the AltiVec.  Essentially, since the AltiVec
>is 4 times faster than the regular FPU, you can imagine doing your loop
>such that 4/5 of the loop is done by the altivec, and 1/5 is done by the
>FPU.  I implemented this, and got a very nice slowdown.

This is more than likely related to the issue above; the 7400/7410 
has an 8 instruction completion queue, and can retire no more than 2 
instructions per cycle.  Using the VFPU and FPU simultaneously is an 
easy way to quickly fill up the completion queue and cause stalls.

These issues may be quite different on a 7450 machine, which has 16 
vector rename registers and can retire three instructions per cycle. 
As of Macworld-New York, all of Apple's shipping  G4 machines are 
7450's.

-Nick

-- 

Nicholas Coult, Ph.D.,  web: http://melby.augsburg.edu/~coult
Assistant Professor, Department of Mathematics, Augsburg College
[email protected], phone:  (612) 330-1064 office: Science Hall 137B