[Date Prev][Date Next][Thread Prev][Thread Next][Date Index][Thread Index]

Re: error in M cleanup



Clint-

My apologies.  I was using a lowercase m= and n= in my own tester all
along!  I don't know what I had been testing, but in any case, the
files are patched in my home directory, and pass my ammended tester,
which I'm including below this time for your comment (I think I should
also include compile time MB and NB in this tester, for example):

#!/bin/bash

for i in s d c z ; do 
    if [ "$1" = "" ] || [ "$1" = "$i" ] ; then 
	if [ $i = c ] || [ $i = z ] ; then 
	    pref=c;  
	else 
	    pref="" ; 
	fi; 
	echo -n $i $(make ${pref}mmutstcase pre=$i kb=56 mmrout=../CASES/ATL_gemm_SSE.c 2>&1 | grep PASS) "  ";
	mrun="mb="
	if make ${pref}mmutstcase pre=$i kb=56 mb=0 m=57 mmrout=../CASES/ATL_gemm_SSE.c 2>&1 | grep PASS >/dev/null ; then
	    mrun="mb=0 M="
	    echo -n "mr "
	fi;
	nrun="nb="
	if make ${pref}mmutstcase pre=$i kb=56 nb=0 n=57 mmrout=../CASES/ATL_gemm_SSE.c 2>&1 | grep PASS >/dev/null ; then
	    nrun="nb=0 N="
	    echo -n "nr "
	fi;
	kb=4;
	while [ $kb -le 80 ] ; do
	    nb=$kb;
	    nbe=$(($kb+4))
	    while [ $nb -le $nbe ] ; do
		mb=$kb;
		mbe=$(($kb+4))
		while [ $mb -le $mbe ] ; do
		    if ! make ${pref}mmutstcase pre=$i kb=$kb $nrun$nb $mrun$mb mmrout=../CASES/ATL_gemm_SSE.c 2>&1 | grep PASS >/dev/null ; then
#			echo -n ${kb}_${nb}_${mb}" "
#		    else
			echo -n ${kb}_${nb}_${mb}x
			exit 1
		    fi
		    mb=$(($mb+1))
		done
		nb=$(($nb+1))
	    done
	    echo -n "$kb "
	    kb=$(($kb+4))
	done
    fi;
done
echo


R Clint Whaley <[email protected]> writes:

> Camm,
> 
> The good news is that using your new SSE2 stuff I'm now getting a complete
> DGEMM (not just mmcase) of roughly 2Gflop.  The bad news is that it still
> doesn't always get the right answer.  In particular there appears to be
> an error in the M cleanup.  For any i such that M = 2 + 4i, it produces
> the wrong answer.  Here's some examples of making the tester fail:
> 
> >> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=2 N=56 K=56
> >> make mmutstcase mmrout=../CASES/ATL_gemm_SSE.c mb=0 nb=56 M=10 N=56 K=56
> 
> Seems like an error in cleanup of a 4 unrolled loop, but I obviously don't
> know.  Can you confirm it's an error, and not just something I'm doing wrong?
> 
> To give some good news with all this, I include timings below comparing the
> new SSE2 DGEMM versus the x86 FPU implementation.
> 
> Thanks,
> Clint
> 
>              100    200    300    400    500    600    700    800    900   1000
>           ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
> P4   x86  1025.6 1194.0 1181.2 1238.7 1209.7 1234.3 1247.3 1264.2 1276.8 1242.2
> P4  SSE2  1351.4 1837.0 1944.0 1828.6 1851.9 1878.3 1960.0 1932.1 1944.0 2000.0
> 
>             1200   1400   1600   1800   2000   2200   2400   2600   2800   3000
>           ====== ====== ====== ====== ====== ====== ====== ====== ====== ======
> P4   x86  1256.7 1250.1 1254.5 1262.3 1261.8 1258.6 1261.3 1261.7 1262.0 1260.5
> P4  SSE2  1986.2 1974.1 1974.0 1970.3 1990.0 1999.6 1991.9 1991.6 2002.0 1974.4
> 
> 

-- 
Camm Maguire			     			[email protected]
==========================================================================
"The earth is but one country, and mankind its citizens."  --  Baha'u'llah