[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]
Re: UltraSparc kernel results
Hi Clint,
firstly thanks very much for doing these tests. The results are
interesting.
>> From [email protected] Fri Oct 13 15:26:20 2000
...
>> by Viet Nguyen & Peter Strazdins. The good news is we get about 90%
>> of performance for double complex, and we modestly beat the vendor for
>> double. We still run only around 80-85% of vendor speed for single
>> precision (the submitted code doesn't help single).
>>
with zgemm(), I feel that it will always be hard for a kernel do better
than a hand-written implementation (you can get close, but second-order
things like stride-2 accesses may take that few % off that makes the
difference).
I'd be interested to hear how the full user-supplied US zgemm()
implementations compare to SunPerf.
>> That's the good news. The bad news is I got access to an Ultra-5/10,
>> sun's PCI-based low-end ultrasparc, and the submitted kernels don't
>> seem to do very well on those machines; ATLAS's generated code is
>> as good as the kernel there, and both get *completely* waxed by
>> sunperf. My guess is the motherboard can have such an effect
>> because the UltraSparc II has an off-chip cache, and the PCI-based
>> one makes the code really different . . . Anyway, I'll have to
>> investigate this further, maybe I just messed up the build . . .
>>
Hmm, this must be the one based on the Ultra IIi chip. I ran a
benchmark on one of these some time ago, and was so disgusted with the
performance (relative to clock speed), I vowed never to run numeric
codes that procesor again :).
I read an article on the IIi, but there was nothing to suggest that it
should be significantly different from the II for floating point.
Possibly you need to use an explicit prefetch instruction (which SunPerf
uses) to get good performance?
Regards, Peter