[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

newbie Athlon optimization question...



Hi, I'm interested in trying to do some optimization for the AMD Athlon
single processor configuration, geared more towards HPL though
as opposed to ATLAS in its entirety.  Through a bunch of testing over
the past few weeks, it appeared that I found a function that contributed
heavily to the performance of HPL: ATL_dJIK60x60x60TN60x60x0_a1_b1.  My
guess is that this function was dynamically generated by ATLAS, which I
assumed since I could only find the above function in source code after
compiling ATLAS and not when I just untarred it.  My reasoning went as
follows:

I first of all compiled both ATLAS and HPL with gprof profiling
support.  I ran HPL with a single processor configuration, and observed
the resulting call tree generated by gprof.  The tree appeared to call
functions in this order:

...
HPL_pdupdateTT (in HPL_pdupdateTT.c)
cblas_dgemm (in cblas_dgemm.c)
ATL_dgemm (in ATL_gemm.c)
ATL_dGEMM2NN (in ATL_gemmXX.c)
ATL_dmmJIK (in ATL_mmJIK.c)
ATL_dmmJIK2 (in ATL_mmJIK.c)
ATL_dJIK60x60x60TN60x60x0_a1_b1 (in file ATL_dNBmm_b1.c)

gprof reports that the last function consumes 73% of the total program CPU
time.  As a test, not for accurate data just for speed questions, I
commented out the call to cblas_dgemm.  The program ran significantly
faster (albeit totally incorrect).  So thus I reasoned that for the sake
of HPL on a single processor Athlon system with my specific configuration,
that that crpytic ATLAS function was teh culprit.  However, my real
question is more general.  I'd rather not spend my time optimizing that
function, as it seems like ATLAS will just generate a more appropriate one
depending on system configuration.  If I were to want to concentrate my
efforts in one particular location, what is the lowest level of statically
created ATLAS code?  I.e., when I download the tarball and extract the
source files, of the files that aren't going to be changed, which one
would be most pertinent to look at?  I'd like to be able to make some
optimizations (again, solely for the sake of improving HPL scores, but
since they depend on ATLAS,I figured this was the proper place for
questioning) that aren't specific to my exact machine.  I hope I'm
describing this all in a way you can understand, I have no clue how
tech/math/physics oreinted everyone is on this list =)  Let me know if I'm
being unlcear...
Thanks for your help, I appreciate it.

-- 
/---------------------------------\
Jeff W., [email protected]
ICQ# 17989474

"It's substance, not process"

http://dark-techno.org
http://logic-slave.org