algol,_n< _b_e_g_i_n _c_o_m_m_e_n_t This is a cheap-and-cheerful rewrite of the PC version of the LINPACK based performance test that underlies the performance tables found on www.top500.org The original C/C++ code was retrieved from www.netlib.org/benchmark/linpack-pc.c I have (arbitrarily) selected to base the rewrite on the C version with compile time options -dDP -dROLL All output generated by the program (other than stop watch related stuff, see below) is sent via BY[4], i.e. select(32). GIER did not have a real-time clock, so I have had to replace calls to "clock()" with the old tested stop watch and then simply type in the number of seconds since last call. This in fact works OK, as the shortest time interval is around 4 seconds and the longest 3 minutes, so the accuracy achieved this way is acceptable. The most practical way to do it is to use the desktop clock and to stop counting when you hear from the GIER sound that calculations have stopped. That was also how we did it in the old days.... The complete run with parameters as they are takes 20 minutes. I gave up on the "getdate()" function - HELP3 was not fully Y2K compliant anyway.... 22. november 2003 Claus Hilberg PS all built-in documentation has been left unmodified. /* * Linpack 100x100 Benchmark In C/C++ For PCs * ******************************************************************** * * Original Source from NETLIB * * Translated to C by Bonnie Toy 5/88 (modified on 2/25/94 to fix * a problem with daxpy for unequal increments or equal increments * not equal to 1. Jack Dongarra) * * To obtain rolled source BLAS, add -DROLL to the command lines. * To obtain unrolled source BLAS, add -DUNROLL to the command lines. * * You must specify one of -DSP or -DDP to compile correctly. * * You must specify one of -DROLL or -DUNROLL to compile correctly. * ******************************************************************** * * Changes in this version * * 1. Function prototypes are declared and function headers have * embedded parameter types to produce code for C and C++ * * 2. Arrays aa and a are declared as [200*200] and [200*201] to * allow compilation with prototypes. * * 3. Function second changed (compiler dependent). * * 4. Timing method changed due to inaccuracy of PC clock (see below). * * 5. Additional date function included (compiler dependent). * * 6. Additional code used as a standard for a series of benchmarks:- * Automatic run time calibration rather than fixed parameters * Initial calibration with display to show linearity * Results displayed at reasonable rate for viewing (5 seconds) * Facilities for typing in details of system used etc. * Compiler details in code in case .exe files used elsewhere * Results appended to a text file (Linpack.txt) * * Roy Longbottom 101323.2241@compuserve.com 14 September 1996 * ************************************************************************ * * Timing * * The PC timer is updated at about 18 times per second or resolution of * 0.05 to 0.06 seconds which is similar to the time taken by the main * time consuming function dgefa on a 100 MHz Pentium. Thus there is no * point in running the dgefa/dges1 combination three times as in the * original version. Main timing for the latter, in the loop run NTIMES, * executes matgen/dgefa, summing the time taken by matgen within the * loop for later deduction from the total time. On a modern PC this sum * can be based on a random selection of 0 or 0.05/0.06. This version * executes the single pass once and the main timing loop five times, * calculating the matgen overhead separately. * ************************************************************************* * * Example of Output * * Rolled Double Precision Linpack Benchmark - PC Version in 'C/C++' * * Compiler Watcom C/C++ 10.5 Win 386 * Optimisation -zp4 -otexan -fp5 -5r -dDP -dROLL * * * norm resid resid machep x[0]-1 x[n-1]-1 * 0.4 7.41628980e-014 1.00000000e-015 -1.49880108e-014 -1.89848137e-014 * * * Times are reported for matrices of order 100 * 1 pass times for array with leading dimension of 201 * * dgefa dgesl total Mflops unit ratio * 0.06000 0.00000 0.06000 11.44 0.1748 1.0714 * * * Calculating matgen overhead * * 10 times 0.11 seconds * 20 times 0.22 seconds * 40 times 0.44 seconds * 80 times 0.87 seconds * 160 times 1.76 seconds * 320 times 3.52 seconds * 640 times 7.03 seconds * * Overhead for 1 matgen 0.01098 seconds * * * Calculating matgen/dgefa passes for 5 seconds * * 10 times 0.71 seconds * 20 times 1.38 seconds * 40 times 2.80 seconds * 80 times 5.66 seconds * * Passes used 70 * * This is followed by output of the normal data for dgefa, dges1, * total, Mflops, unit and ratio with five sets of results for each. * ************************************************************************ * * Example from output file Linpack.txt * * LINPACK BENCHMARK FOR PCs 'C/C++' n @ 100 * * Month run 9/1996 * PC model Escom * CPU Pentium * Clock MHz 100 * Cache 256K * Options Neptune chipset * OS/DOS Windows 95 * Compiler Watcom C/C++ 10.5 Win 386 * OptLevel -zp4 -otexan -fp5 -5r -dDP -dROLL * Run by Roy Longbottom * From UK * Mail 101323.2241@compuserve.com * * Rolling Rolled * Precision Double * norm. resid 0.4 * resid 7.41628980e-014 * machep 1.00000000e-015 (8.88178420e-016 NON OPT) * x[0]-1 -1.49880108e-014 * x[n-1]-1 -1.89848137e-014 * matgen 1 seconds 0.01051 * matgen 2 seconds 0.01050 * Repetitions 70 * Leading dimension 201 * dgefa dgesl total Mflops * 1 pass seconds 0.06000 0.00000 0.06000 * Repeat seconds 0.06092 0.00157 0.06249 10.99 * Repeat seconds 0.06077 0.00157 0.06234 11.01 * Repeat seconds 0.06092 0.00157 0.06249 10.99 * Repeat seconds 0.06092 0.00157 0.06249 10.99 * Repeat seconds 0.06092 0.00157 0.06249 10.99 * Average 10.99 * Leading dimension 200 * Repeat seconds 0.05936 0.00157 0.06093 11.27 * Repeat seconds 0.05936 0.00157 0.06093 11.27 * Repeat seconds 0.05864 0.00157 0.06021 11.40 * Repeat seconds 0.05936 0.00157 0.06093 11.27 * Repeat seconds 0.05864 0.00157 0.06021 11.40 * Average 11.32 * ************************************************************************ * * Examples of Results * * Precompiled codes were produced via a Watcom C/C++ 10.5 compiler. * Versions are available for DOS, Windows 3/95 and NT/Win 95. Both * non-optimised and optimised programs are available. The latter has * options as in the above example. Although these options can place * functions in-line, in this case, daxpy is not in-lined. Optimisation * reduces 18 instructions in the loop in this function to the following: * * L85 fld st(0) * fmul qword ptr [edx] * add eax,00000008H * add edx,00000008H * fadd qword ptr -8H[eax] * inc ebx * fstp qword ptr -8H[eax] * cmp ebx,esi * jl L85 * * Results produced are not consistent between runs but produce similar * speeds when executing at a particular dimension (see above). An example * of other results is 11.4/10.5 Mflops. Most typical double precision * rolled results are: * * Opt No Opt Version/ * MHz Cache Mflops Mflops Make/Options Via * * AM80386DX 40 128K 0.53 0.36 Clone Win/W95 * 80486DX2 66 128K 2.5 1.9 Escom SIS chipset Win/W95 * 80486DX2 66 128K 2.3 1.9 Escom SIS chipset NT/W95 * 80486DX2 66 128K 2.8 2.0 Escom SIS chipset Dos/Dos * Pentium 100 256K 11 4.2 Escom Neptune chipset Win/W95 * Pentium 100 256K 11 5.5 Escom Neptune chipset NT/W95 * Pentium 100 256K 12 4.4 Escom Neptune chipset Dos/Dos * Pentium Pro 200 256K 48 19 Dell XPS Pro200n NT/NT * * The results are as produced when compiled as Linpack.cpp. Compiling as * Linpack.c gives similar speeds but the code is a little different. * *************************************************************************** */; _r_e_a_l ZERO, ONE; _i_n_t_e_g_e_r lda, ldaa, NTIMES, thisyear, thismonth, thisdate; _a_r_r_a_y atime [0:9, 0:15]; _c_o_m_m_e_n_t /*TIME TIME TIME TIME TIME TIME TIME TIME TIME TIME TIME TIME TIME */; _r_e_a_l _p_r_o_c_e_d_u_r_e second ( secpar ); _v_a_l_u_e secpar; _i_n_t_e_g_e_r secpar; _b_e_g_i_n _r_e_a_l secs; select ( 17 ); _i_f ( secpar > 0 ) _t_h_e_n _b_e_g_i_n writetext (|<); secs := readinteger; writecr; _e_n_d _e_l_s_e _b_e_g_i_n secs := 0; writecr; writetext (|<); lyn; writecr; _e_n_d; select ( 32 ); second := secs; _e_n_d; _c_o_m_m_e_n_t this code from Mogens Kjaer /* DATE DATE DATE DATE DATE DATE DATE DATE DATE DATE DATE DATE DATE */; _p_r_o_c_e_d_u_r_e whatdate; _b_e_g_i_n _i_n_t_e_g_e_r date; where(|<, date); thisdate := _i_n_t_e_g_e_r (((_B_o_o_l_e_a_n date) _s_h_i_f_t -20) & _3_00 _1_0 m); thismonth := _i_n_t_e_g_e_r (((_B_o_o_l_e_a_n date) _s_h_i_f_t -10) & _3_0 0 _1_0 m); thisyear := _i_n_t_e_g_e_r ((_B_o_o_l_e_a_n date) & _3_0 0 _1_0 m); thisyear := thisyear + (_i_f thisyear < 60 _t_h_e_n 2000 _e_l_s_e 1900); _e_n_d; _p_r_o_c_e_d_u_r_e printtime ( row ); _i_n_t_e_g_e_r row; _b_e_g_i_n write(|< -d.ddd'dd|>,atime[0,row],atime[1,row], atime[2,row], atime[3,row], atime[4,row], atime[5,row]); writecr; return: _e_n_d; _c_o_m_m_e_n_t /* We would like to declare a[][lda], but c does not allow it. In this function, references to a[i][j] are written a[lda*i+j]. */; _p_r_o_c_e_d_u_r_e matgen ( a, lda, n, b, norma ); _v_a_l_u_e lda, n; _i_n_t_e_g_e_r lda; _a_r_r_a_y a; _i_n_t_e_g_e_r n; _a_r_r_a_y b; _r_e_a_l norma; _b_e_g_i_n _i_n_t_e_g_e_r init, i, j; init := 1325; norma := 0.0; _f_o_r j := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n init := (3125*init) _m_o_d 65536; a[lda*j+i] := (init - 32768.0)/16384.0; _i_f ( a[lda*j+i] > norma ) _t_h_e_n norma := a[lda*j+i]; _e_n_d; _e_n_d; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n b[i] := 0.0; _e_n_d; _f_o_r j := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n b[i] := b[i] + a[lda*j+i]; _e_n_d; _e_n_d; _e_n_d; _c_o_m_m_e_n_t /* We would like to declare a[][lda], but c does not allow it. In this function, references to a[i][j] are written a[lda*i+j]. */; _c_o_m_m_e_n_t /* dgefa factors a double precision matrix by gaussian elimination. dgefa is usually called by dgeco, but it can be called directly with a saving in time if rcond is not needed. (time for dgeco) = (1 + 9/n)*(time for dgefa) . on entry a REAL precision[n][lda] the matrix to be factored. lda integer the leading dimension of the array a . n integer the order of the matrix a . on return a an upper triangular matrix and the multipliers which were used to obtain it. the factorization can be written a = l*u where l is a product of permutation and unit lower triangular matrices and u is upper triangular. ipvt integer[n] an integer vector of pivot indices. info integer = 0 normal value. = k if u[k][k] .eq. 0.0 . this is not an error condition for this subroutine, but it does indicate that dgesl or dgedi will divide by zero if called. use rcond in dgeco for a reliable indication of singularity. linpack. this version dated 08/14/78 . cleve moler, university of new mexico, argonne national lab. functions blas daxpy,dscal,idamax */; _p_r_o_c_e_d_u_r_e dgefa( a, lda, n, ipvt, info ); _v_a_l_u_e n; _a_r_r_a_y a; _i_n_t_e_g_e_r lda; _i_n_t_e_g_e_r n; _i_n_t_e_g_e_r _a_r_r_a_y ipvt; _i_n_t_e_g_e_r info; _b_e_g_i_n _c_o_m_m_e_n_t /* internal variables */; _r_e_a_l t; _i_n_t_e_g_e_r j,k,kp1,l,nm1,ldax; _c_o_m_m_e_n_t /* gaussian elimination with partial pivoting */; info := 0; nm1 := n - 1; _i_f (nm1 _> 0) _t_h_e_n _b_e_g_i_n _f_o_r k := 0 _s_t_e_p 1 _u_n_t_i_l nm1-1 _d_o _b_e_g_i_n _c_o_m_m_e_n_t /* find l = pivot index */; kp1 := k + 1; l := idamax(n-k,a,lda*k+k,1) + k; ipvt[k] := l; _c_o_m_m_e_n_t /* zero pivot implies this column already triangularized */; _i_f (a[lda*k+l] |= ZERO) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t /* interchange if necessary */; _i_f (l |= k) _t_h_e_n _b_e_g_i_n ldax := lda * k; t := a[ldax+l]; a[ldax+l] := a[ldax+k]; a[ldax+k] := t; _e_n_d; _c_o_m_m_e_n_t /* compute multipliers */; t := -ONE/a[lda*k+k]; dscal(n-(k+1),t,a,lda*k+k+1,1); _c_o_m_m_e_n_t /* row elimination with column indexing */; _f_o_r j := kp1 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n ldax := lda * j; t := a[ldax+l]; _i_f (l |= k) _t_h_e_n _b_e_g_i_n a[ldax+l] := a[ldax+k]; a[ldax+k] := t; _e_n_d; daxpy(n-(k+1),t,a,lda*k+k+1,1, a,lda*j+k+1,1); _e_n_d; _e_n_d _e_l_s_e _b_e_g_i_n info := k; _e_n_d; _e_n_d; _e_n_d; ipvt[n-1] := n-1; _i_f (a[lda*(n-1)+(n-1)] = ZERO) _t_h_e_n info := n-1; return: _e_n_d; _c_o_m_m_e_n_t /* We would like to declare a[][lda], but c does not allow it. In this function, references to a[i][j] are written a[lda*i+j]. */; _c_o_m_m_e_n_t /* dgesl solves the double precision system a * x = b or trans(a) * x = b using the factors computed by dgeco or dgefa. on entry a double precision[n][lda] the output from dgeco or dgefa. lda integer the leading dimension of the array a . n integer the order of the matrix a . ipvt integer[n] the pivot vector from dgeco or dgefa. b double precision[n] the right hand side vector. job integer = 0 to solve a*x = b , = nonzero to solve trans(a)*x = b where trans(a) is the transpose. on return b the solution vector x . error condition a division by zero will occur if the input factor contains a zero on the diagonal. technically this indicates singularity but it is often caused by improper arguments or improper setting of lda . it will not occur if the subroutines are called correctly and if dgeco has set rcond .gt. 0.0 or dgefa has set info .eq. 0 . to compute inverse(a) * c where c is a matrix with p columns dgeco(a,lda,n,ipvt,rcond,z) if (!rcond is too small){ for (j=0,j 1) _t_h_e_n _b_e_g_i_n _f_o_r k := 0 _s_t_e_p 1 _u_n_t_i_l nm1-1 _d_o _b_e_g_i_n l := ipvt[k]; t := b[l]; _i_f (l |= k) _t_h_e_n _b_e_g_i_n b[l] := b[k]; b[k] := t; _e_n_d; daxpy(n-(k+1),t,a,lda*k+k+1,1,b,k+1,1 ); _e_n_d; _e_n_d; _c_o_m_m_e_n_t /* now solve u*x = y */; _f_o_r kb := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n k := n - (kb + 1); b[k] := b[k]/a[lda*k+k]; t := -b[k]; daxpy(k,t,a,lda*k+0,1,b,0,1 ); _e_n_d; _e_n_d _e_l_s_e _b_e_g_i_n _c_o_m_m_e_n_t /* job = nonzero, solve trans(a) * x = b first solve trans(u)*y = b */; _f_o_r k := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n t := ddot(k,a,lda*k+0,1,b,0,1); b[k] := (b[k] - t)/a[lda*k+k]; _e_n_d; _c_o_m_m_e_n_t /* now solve trans(l)*x = y */; _i_f (nm1 _> 1) _t_h_e_n _b_e_g_i_n _f_o_r kb := 1 _s_t_e_p 1 _u_n_t_i_l nm1-1 _d_o _b_e_g_i_n k := n - (kb+1); b[k] := b[k] + ddot(n-(k+1),a,lda*k+k+1,1,b,k+1,1); l := ipvt[k]; _i_f (l |= k) _t_h_e_n _b_e_g_i_n t := b[l]; b[l] := b[k]; b[k] := t; _e_n_d; _e_n_d; _e_n_d; _e_n_d; _e_n_d; _c_o_m_m_e_n_t /* constant times a vector plus a vector. jack dongarra, linpack, 3/11/78. */; _p_r_o_c_e_d_u_r_e daxpy( n, da, dx, sx, incx, dy, sy, incy ); _v_a_l_u_e n, da, sx, incx, sy, incy; _i_n_t_e_g_e_r n; _r_e_a_l da; _a_r_r_a_y dx; _i_n_t_e_g_e_r sx, incx; _a_r_r_a_y dy; _i_n_t_e_g_e_r sy, incy; _b_e_g_i_n _i_n_t_e_g_e_r i,ix,iy,m,mp1; mp1 := 0; m := 0; _i_f(n _< 0) _t_h_e_n _g_o_t_o return; _i_f (da = ZERO) _t_h_e_n _g_o_t_o return; _i_f(incx |= 1 £ incy |= 1) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t /* code for unequal increments or equal increments not equal to 1 */; ix := 0; iy := 0; _i_f(incx < 0) _t_h_e_n ix := (-n+1)*incx; _i_f(incy < 0) _t_h_e_n iy := (-n+1)*incy; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n dy[sy+iy] := dy[sy+iy] + da*dx[sx+ix]; ix := ix + incx; iy := iy + incy; _e_n_d _f_o_r; _g_o_t_o return; _e_n_d; _c_o_m_m_e_n_t /* code for both increments equal to 1 */ ROLLED VERSION ; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n dy[sy+i] := dy[sy+i] + da*dx[sx+i]; _e_n_d _f_o_r; return: _e_n_d; _r_e_a_l _p_r_o_c_e_d_u_r_e ddot( n, dx, sx, incx, dy, sy, incy ); _c_o_m_m_e_n_t /* forms the dot product of two vectors. jack dongarra, linpack, 3/11/78. */; _v_a_l_u_e n, sx, incx, sy, incy; _i_n_t_e_g_e_r n; _a_r_r_a_y dx; _i_n_t_e_g_e_r sx, incx; _a_r_r_a_y dy; _i_n_t_e_g_e_r sy, incy; _b_e_g_i_n _r_e_a_l dtemp; _i_n_t_e_g_e_r i,ix,iy,m,mp1; mp1 := 0; m := 0; dtemp := ZERO; _i_f (n _< 0) _t_h_e_n _b_e_g_i_n ddot := ZERO; _g_o_t_o return; _e_n_d; _i_f (incx |= 1 £ incy |= 1) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t /* code for unequal increments or equal increments not equal to 1 */; ix := 0; iy := 0; _i_f (incx < 0) _t_h_e_n ix := (-n+1)*incx; _i_f (incy < 0) _t_h_e_n iy := (-n+1)*incy; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n dtemp := dtemp + dx[sx+ix]*dy[sy+iy]; ix := ix + incx; iy := iy + incy; _e_n_d; ddot := dtemp; _g_o_t_o return; _e_n_d; _c_o_m_m_e_n_t /* code for both increments equal to 1 */ ROLLED VERSION ; _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n dtemp := dtemp + dx[sx+i]*dy[sy+i]; _e_n_d _f_o_r; ddot := dtemp; return: _e_n_d; _p_r_o_c_e_d_u_r_e dscal( n, da, dx, sx, incx); _c_o_m_m_e_n_t /* scales a vector by a constant. jack dongarra, linpack, 3/11/78. */; _v_a_l_u_e n, da, sx, incx; _i_n_t_e_g_e_r n, sx; _r_e_a_l da; _a_r_r_a_y dx; _i_n_t_e_g_e_r incx; _b_e_g_i_n _i_n_t_e_g_e_r i,m,mp1,nincx,iu; mp1 := 0; m := 0; _i_f (n _< 0) _t_h_e_n _g_o_t_o return; _i_f (incx |= 1) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t /* code for increment not equal to 1 */; nincx := n*incx; _f_o_r i:=0 _s_t_e_p incx _u_n_t_i_l nincx-1 _d_o dx[sx+i] := da*dx[sx+i]; _g_o_t_o return; _e_n_d; _c_o_m_m_e_n_t /* code for increment equal to 1 */ ROLLED VERSION ; iu := sx + n - 1; _f_o_r i:=sx _s_t_e_p 1 _u_n_t_i_l iu _d_o _b_e_g_i_n dx[i] := da*dx[i]; _e_n_d _f_o_r; return: _e_n_d; _i_n_t_e_g_e_r _p_r_o_c_e_d_u_r_e idamax( n, dx, sx, incx ); _c_o_m_m_e_n_t /* finds the index of element having max. absolute value. jack dongarra, linpack, 3/11/78. */; _v_a_l_u_e n, sx, incx; _i_n_t_e_g_e_r n, sx; _a_r_r_a_y dx; _i_n_t_e_g_e_r incx; _b_e_g_i_n _r_e_a_l dmax; _i_n_t_e_g_e_r i, ix, itemp; _i_f ( n < 1 ) _t_h_e_n _b_e_g_i_n idamax := -1; _g_o_t_o return; _e_n_d; _i_f (n =1 ) _t_h_e_n _b_e_g_i_n idamax := 0; _g_o_t_o return; _e_n_d; _i_f (incx |= 1) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t /* code for increment not equal to 1 */; ix := 1; dmax := abs( dx[sx]); ix := ix + incx; _f_o_r i:=1 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n _i_f (abs( dx[sx+ix]) > dmax) _t_h_e_n _b_e_g_i_n itemp := i; dmax := abs( dx[sx+ix]); _e_n_d; ix := ix + incx; _e_n_d; _e_n_d _e_l_s_e _b_e_g_i_n _c_o_m_m_e_n_t /* code for increment equal to 1 */; itemp := 0; dmax := abs( dx[sx]); _f_o_r i:=1 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n _i_f (abs( dx[sx+i]) > dmax) _t_h_e_n _b_e_g_i_n itemp := i; dmax := abs( dx[sx+i]); _e_n_d; _e_n_d; _e_n_d; idamax := itemp; return: _e_n_d; _r_e_a_l _p_r_o_c_e_d_u_r_e epslon ( x ); _c_o_m_m_e_n_t /* estimate unit roundoff in quantities of size x. */; _r_e_a_l x; _b_e_g_i_n _r_e_a_l a,b,c,eps; _c_o_m_m_e_n_t /* this program should function properly on all systems satisfying the following two assumptions, 1. the base used in representing dfloating point numbers is not a power of three. 2. the quantity a in statement 10 is represented to the accuracy used in dfloating point variables that are stored in memory. the statement number 10 and the go to 10 are intended to force optimizing compilers to generate code satisfying assumption 2. under these assumptions, it should be true that, a is not exactly equal to four-thirds, b has a zero for its last bit or digit, c is not exactly equal to one, eps measures the separation of 1.0 from the next larger dfloating point number. the developers of eispack would appreciate being informed about any systems where these assumptions do not hold. ***************************************************************** this routine is one of the auxiliary routines used by eispack iii to avoid machine dependencies. ***************************************************************** this version dated 4/6/83. */; a := 4.0/3.0; eps := ZERO; _f_o_r a := a _w_h_i_l_e (eps = ZERO) _d_o _b_e_g_i_n b := a - ONE; c := b + b + b; eps := abs((c-ONE)); _e_n_d; epslon := eps*abs(x); _e_n_d; _p_r_o_c_e_d_u_r_e dmxpy ( n1, y, n2, ldm, x, m ); _c_o_m_m_e_n_t /* We would like to declare m[][ldm], but c does not allow it. In this function, references to m[i][j] are written m[ldm*i+j]. */; _c_o_m_m_e_n_t /* purpose: multiply matrix m times vector x and add the result to vector y. parameters: n1 integer, number of elements in vector y, and number of rows in matrix m y double [n1], vector of length n1 to which is added the product m*x n2 integer, number of elements in vector x, and number of columns in matrix m ldm integer, leading dimension of array m x double [n2], vector of length n2 m double [ldm][n2], matrix of n1 rows and n2 columns ---------------------------------------------------------------------- */; _v_a_l_u_e n1, n2, ldm; _i_n_t_e_g_e_r n1; _a_r_r_a_y y; _i_n_t_e_g_e_r n2; _i_n_t_e_g_e_r ldm; _a_r_r_a_y x; _a_r_r_a_y m; _b_e_g_i_n _i_n_t_e_g_e_r j,i,jmin; _c_o_m_m_e_n_t /* cleanup odd vector */; j := n2 _m_o_d 2; _i_f (j _> 1) _t_h_e_n _b_e_g_i_n j := j - 1; _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n1-1 _d_o y[i] := (y[i]) + x[j]*m[ldm*j+i]; _e_n_d; _c_o_m_m_e_n_t /* cleanup odd group of two vectors */; j := n2 _m_o_d 4; _i_f (j _> 2) _t_h_e_n _b_e_g_i_n j := j - 1; _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n1-1 _d_o y[i] := ( (y[i]) + x[j-1]*m[ldm*(j-1)+i]) + x[j]*m[ldm*j+i]; _e_n_d; _c_o_m_m_e_n_t /* cleanup odd group of four vectors */; j := n2 _m_o_d 8; _i_f (j _> 4) _t_h_e_n _b_e_g_i_n j := j - 1; _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n1-1 _d_o y[i] := ((( (y[i]) + x[j-3]*m[ldm*(j-3)+i]) + x[j-2]*m[ldm*(j-2)+i]) + x[j-1]*m[ldm*(j-1)+i]) + x[j]*m[ldm*j+i]; _e_n_d; _c_o_m_m_e_n_t /* cleanup odd group of eight vectors */; j := n2 _m_o_d 16; _i_f (j _> 8) _t_h_e_n _b_e_g_i_n j := j - 1; _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n1-1 _d_o y[i] := ((((((( (y[i]) + x[j-7]*m[ldm*(j-7)+i]) + x[j-6]*m[ldm*(j-6)+i]) + x[j-5]*m[ldm*(j-5)+i]) + x[j-4]*m[ldm*(j-4)+i]) + x[j-3]*m[ldm*(j-3)+i]) + x[j-2]*m[ldm*(j-2)+i]) + x[j-1]*m[ldm*(j-1)+i]) + x[j] *m[ldm*j+i]; _e_n_d; _c_o_m_m_e_n_t /* main loop - groups of sixteen vectors */; jmin := (n2 _m_o_d 16)+16; _f_o_r j := jmin-1 _s_t_e_p 16 _u_n_t_i_l n2-1 _d_o _b_e_g_i_n _f_o_r i:=0 _s_t_e_p 1 _u_n_t_i_l n1-1 _d_o y[i] := ((((((((((((((( (y[i]) + x[j-15]*m[ldm*(j-15)+i]) + x[j-14]*m[ldm*(j-14)+i]) + x[j-13]*m[ldm*(j-13)+i]) + x[j-12]*m[ldm*(j-12)+i]) + x[j-11]*m[ldm*(j-11)+i]) + x[j-10]*m[ldm*(j-10)+i]) + x[j- 9]*m[ldm*(j- 9)+i]) + x[j- 8]*m[ldm*(j- 8)+i]) + x[j- 7]*m[ldm*(j- 7)+i]) + x[j- 6]*m[ldm*(j- 6)+i]) + x[j- 5]*m[ldm*(j- 5)+i]) + x[j- 4]*m[ldm*(j- 4)+i]) + x[j- 3]*m[ldm*(j- 3)+i]) + x[j- 2]*m[ldm*(j- 2)+i]) + x[j- 1]*m[ldm*(j- 1)+i]) + x[j] *m[ldm*j+i]; _e_n_d; return: _e_n_d; _b_e_g_i_n _c_o_m_m_e_n_t Arrays of size 200 x 200, as used in the C version, would be on the high side for GIER. I have chosen arrays size 40 x 40 corresponding to a GIER machine with external buffer. I have furthermore elected to work with problems of size 30 x 30 - anything higher than that were beyond my patience; _a_r_r_a_y aa[0:40*40],a[0:40*41],b[0:40],x[0:40]; _r_e_a_l cray,ops,total,norma,normx; _r_e_a_l resid,residn,eps,t1,tm2,epsn,x1,x2; _r_e_a_l mflops; _i_n_t_e_g_e_r _a_r_r_a_y ipvt[0:40]; _i_n_t_e_g_e_r n,i,j,ntimes,info; _i_n_t_e_g_e_r Endit, pass, loop; _r_e_a_l overhead1, overhead2, time1, time2; ZERO := 0; ONE := 1.0; NTIMES := 5; lda := 41; ldaa := 40; cray := .056; n := 30; select ( 32 ); writetext (|<); writecr; writetext (|<); writecr; writetext (|<); writecr; writecr; ops := (2.0*(n*n*n))/3.0 + 2.0*(n*n); matgen(a,lda,n,b,norma); t1 := second ( -1 ); dgefa(a,lda,n,ipvt,info); atime[0,0] := second ( 1 ) - t1; t1 := second ( -1 ); dgesl(a,lda,n,ipvt,b,0); atime[1,0] := second ( 1 ) - t1; total := atime[0,0] + atime[1,0]; _c_o_m_m_e_n_t /* compute a residual to verify results. */; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n x[i] := b[i]; _e_n_d; matgen(a,lda,n,b,norma); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n b[i] := -b[i]; _e_n_d; dmxpy(n,b,n,lda,x,a); resid := 0.0; normx := 0.0; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l n-1 _d_o _b_e_g_i_n resid := ( _i_f ( resid > abs(b[i])) _t_h_e_n resid _e_l_s_e abs(b[i])); normx := ( _i_f ( normx > abs(x[i])) _t_h_e_n normx _e_l_s_e abs(x[i])); _e_n_d; eps := epslon(ONE); residn := resid/( n*norma*normx*eps ); epsn := eps; x1 := x[0] - 1; x2 := x[n-1] - 1; writetext(|<); writetext(|<< x[0]-1 x[n-1]-1|>); writecr; write (|< -d.dddd'-dd|>, residn, resid, epsn, x1, x2); writecr; writecr; writetext(|<); write ( |,n); writecr; writetext(|<<1 pass times for array with leading dimension of|>); write ( |,lda); writecr; writetext(|<< dgefa dgesl total Kflops unit|>); writetext(|<< ratio|>); writecr; atime[2,0] := total; _i_f (total > 0.0) _t_h_e_n _b_e_g_i_n _c_o_m_m_e_n_t NB Kiloflops i stedet for Megaflops; atime[3,0] := ops/(1.0'3*total); atime[4,0] := 2.0/atime[3,0]; _e_n_d _e_l_s_e _b_e_g_i_n atime[3,0] := 0.0; atime[4,0] := 0.0; _e_n_d; atime[5,0] := total/cray; printtime(0); _c_o_m_m_e_n_t /************************************************************************ * Calculate overhead of executing matgen procedure * ************************************************************************/; writecr; writetext (|<); writecr; pass := -20; loop := NTIMES; _f_o_r pass := pass _w_h_i_l_e ( pass < 0 ) _d_o _b_e_g_i_n time1 := second ( -1 ); pass := pass + 1; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l loop-1 _d_o _b_e_g_i_n matgen(a,lda,n,b,norma); _e_n_d; time2 := second ( 1 ); overhead1 := (time2 - time1); write (|, loop ); writetext (|<< times |>); write (|, overhead1 ); writetext (|<< seconds|> ); writecr; _i_f (overhead1 > 5.0) _t_h_e_n _b_e_g_i_n pass := 0; _e_n_d; _i_f (pass < 0) _t_h_e_n _b_e_g_i_n _i_f (overhead1 < 0.1) _t_h_e_n _b_e_g_i_n loop := loop * 10; _e_n_d _e_l_s_e _b_e_g_i_n loop := loop * 2; _e_n_d; _e_n_d; _e_n_d; overhead1 := overhead1 / loop; writetext (|<); write (|, overhead1 ); writetext (|<< seconds|>); writecr; _c_o_m_m_e_n_t /************************************************************************ * Calculate matgen/dgefa passes for 5 seconds * ************************************************************************/; writetext (|<); writecr; pass := -20; ntimes := NTIMES; _f_o_r pass := pass_w_h_i_l_e ( pass < 0 ) _d_o _b_e_g_i_n time1 := second ( -1 ); pass := pass + 1; _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l ntimes-1 _d_o _b_e_g_i_n matgen(a,lda,n,b,norma); dgefa(a,lda,n,ipvt,info ); _e_n_d; time2 := second ( 1 ) - time1; write (|, ntimes ); writetext (|<< times |>); write (|, time2 ); writetext (|<< seconds|>); writecr; _i_f (time2 > 5.0) _t_h_e_n _b_e_g_i_n pass := 0; _e_n_d; _i_f (pass < 0) _t_h_e_n _b_e_g_i_n _i_f (time2 < 0.1) _t_h_e_n _b_e_g_i_n ntimes := ntimes * 10; _e_n_d _e_l_s_e _b_e_g_i_n ntimes := ntimes * 2; _e_n_d; _e_n_d; _e_n_d; ntimes := 5.0 * ntimes / time2; _i_f (ntimes = 0) _t_h_e_n ntimes := 1; writetext (|<); write (|, ntimes); writecr; writetext(|<); write (|,lda); writecr; writecr; writetext(|<< dgefa dgesl total Kflops unit|>); writetext(|<< ratio|>); writecr; _c_o_m_m_e_n_t /************************************************************************ * Execute 5 passes * ************************************************************************/; tm2 := ntimes * overhead1; atime[3,6] := 0; _f_o_r j:=1 _s_t_e_p 1 _u_n_t_i_l 5 _d_o _b_e_g_i_n t1 := second ( -1 ); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l ntimes-1 _d_o _b_e_g_i_n matgen(a,lda,n,b,norma); dgefa(a,lda,n,ipvt,info ); _e_n_d; atime[0,j] := (second ( 1 ) - t1 - tm2)/ntimes; t1 := second ( -1 ); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l ntimes-1 _d_o _b_e_g_i_n dgesl(a,lda,n,ipvt,b,0); _e_n_d; atime[1,j] := (second ( 1 ) - t1)/ntimes; total := atime[0,j] + atime[1,j]; atime[2,j] := total; atime[3,j] := ops/(1.0'3*total); atime[4,j] := 2.0/atime[3,j]; atime[5,j] := total/cray; atime[3,6] := atime[3,6] + atime[3,j]; printtime(j); _e_n_d; atime[3,6] := atime[3,6] / 5.0; writetext (|<); write (|, atime[3,6]); writecr; writetext (|<); writecr; _c_o_m_m_e_n_t /************************************************************************ * Calculate overhead of executing matgen procedure * ************************************************************************/; time1 := second ( -1 ); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l loop-1 _d_o _b_e_g_i_n matgen(aa,ldaa,n,b,norma); _e_n_d; time2 := second ( 1 ); overhead2 := (time2 - time1); overhead2 := overhead2 / loop; writetext (|<); write (|, overhead2 ); writetext (|<< seconds|>); writecr; writetext(|<); write (|,ldaa ); writecr; writetext(|<< dgefa dgesl total Mflops unit|>); writetext(|<< ratio|>); writecr; _c_o_m_m_e_n_t /************************************************************************ * Execute 5 passes * ************************************************************************/; tm2 := ntimes * overhead2; atime[3,12] := 0; _f_o_r j := 7 _s_t_e_p 1 _u_n_t_i_l 11 _d_o _b_e_g_i_n t1 := second ( -1 ); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l ntimes-1 _d_o _b_e_g_i_n matgen(aa,ldaa,n,b,norma); dgefa(aa,ldaa,n,ipvt,info ); _e_n_d; atime[0,j] := (second ( 1 ) - t1 - tm2)/ntimes; t1 := second ( -1 ); _f_o_r i := 0 _s_t_e_p 1 _u_n_t_i_l ntimes-1 _d_o _b_e_g_i_n dgesl(aa,ldaa,n,ipvt,b,0); _e_n_d; atime[1,j] := (second ( 1 ) - t1)/ntimes; total := atime[0,j] + atime[1,j]; atime[2,j] := total; atime[3,j] := ops/(1.0'3*total); atime[4,j] := 2.0/atime[3,j]; atime[5,j] := total/cray; atime[3,12] := atime[3,12] + atime[3,j]; printtime(j); _e_n_d; atime[3,12] := atime[3,12] / 5.0; writetext (|< ); write (|, atime[3,12]); writecr; _c_o_m_m_e_n_t /************************************************************************ * Use minimum average as overall Mflops rating * ************************************************************************/; mflops := atime[3,6]; _i_f (atime[3,12] < mflops) _t_h_e_n mflops := atime[3,12]; writecr; write (|, mflops ); writetext (|<< Kflops|>); writecr; whatdate; _c_o_m_m_e_n_t /************************************************************************ * Add results to output file LLloops.txt * ************************************************************************/; writetext (|<<----------------- ----------------- --------- |>); writetext (|<<--------- ---------|>); writecr; writetext (|<); writecr;writecr; writetext ( |<); write (|, thisdate ); writetext(|<); write (|, thismonth ); write (|< dddd|>, thisyear ); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); writecr; writetext ( |<); write (|,residn); writecr; writetext ( |<); write (|< d.ddddd'dd|>,resid); writecr; writetext ( |<); write (|< d.ddddd'dd|>,epsn); writecr; writetext ( |<); write (|< d.ddddd'dd|>,x1); writecr; writetext ( |<); write (|< d.ddddd'dd|>,x2); writecr; writetext ( |<); write (|,overhead1); writecr; writetext ( |<); write (|,overhead2); writecr; writetext ( |<); write (|,ntimes); writecr; writetext ( |<); write (|,lda); writecr; writetext ( |<< dgefa dgesl total Kflops|>); writecr; writetext ( |<<1 pass seconds |>); write ( |<-dddddd.ddd|>, atime[0,0], atime[1,0], atime[2,0]); writecr; _f_o_r i := 1 _s_t_e_p 1 _u_n_t_i_l 5 _d_o _b_e_g_i_n writetext(|<); write ( |<-dddddd.ddd|>, atime[0,i], atime[1,i], atime[2,i], atime[3,i]); writecr; _e_n_d; writetext(|< ); write (|, atime[3,7]); writecr; writetext(|<); write (|,ldaa); writecr; _f_o_r i := 7 _s_t_e_p 1 _u_n_t_i_l 11 _d_o _b_e_g_i_n writetext(|<); write ( |<-dddddd.ddd|>, atime[0,i], atime[1,i], atime[2,i], atime[3,i]); writecr; _e_n_d; writetext(|<); write (|,atime[3,12]); writecr; select ( 17 ); writetext(|<); writecr; Endit := lyn; _e_n_d; _e_n_d; t<