Fra DDHFwiki
Spring til navigation Spring til søgning

In the article The Design of the GIER ALGOL Compiler, Part II Peter Naur writes the following:

... For these reasons we wish to base our assessment of the performance of the system on analyses of the time spent on the various language functions during the execution of realistic, practical algorithms. By this approach we will be able to detect the bottlenecks of the execution. Analyses of this kind have been made for algorithms for inverting matrices, for finding eigenvalues of symmetric matrices, and for definite integration by Simpson's rule. These analyses very definitely point to two bottlenecks: (1) subscripting, and (2) transfer of control to a segment which is already present in the core store. The relative importance of these two items varies greatly, not only with the program, but also with the manner in which the program happens to be segmented. However, as a rough average it appears that these two items together account for well over half the execution time of many realistic programs. As a further conclusion it may be stated that such programs might be speeded up by a factor of two or more if two or three special machine instructions designed to take care of these two problems were included in the machine.

I've tried to take a look at point no. 2, transfer of control to a segment already in core. Not on the real GIER machine, but on the simulator. The idea is to add an extra instruction to speed up the transfer.

Transfer of control

The segmentation of the compiled GIER ALGOL program works like this:

  1. The compiler delivers the compiled program on the drum in tracks of 40 words.
  2. Each track begins with the constants needed for this segment, followed by the generated instructions.
  3. The first track is brought into core when the execution begins, and control is passed to the first instruction in the segment.
  4. The last instruction in the segment contains a call into the Running System (RS, the part of the GIER ALGOL system resident in core). RS looks up the next segment number in a table, and if it is already in core, transfers execution to the segment. The segment is loaded from the drum if it is not present.
  5. RS keeps track of the segments in core when the stack grows and releases the program segments as needed.
  6. A priority system is used to release segments from core when new segments are brought in.

We take a look on what happens when execution is transferred from on segment to another segment already in core:

The last word of the segment contains a subroutine call to the label c1 in RS:

hs   c1, xxx

The right halfword of the cell contains the number of the first executable instruction in the next segment as the address. The word is f-marked (bit 41) if control is to be passed to the right halfword of the cell.

The sequence c1 in RS contains the following:

c1:  gr  b21   , arn s     ;   save R:= R;
     gt  b9            IRC ;   rel addr:= set RC (relpart(R));
     arn(b1)   D   1       ;   Raddr:= track:= track + 1; 
d19h:hv (b7) , vy[*errunit];   _g_o_t_o table search [top table]

The first word saves the R register and fetches the calling hs instruction. The second word saves the relative address in b9. The third word increases the track number in b1 and transfers it to the address part of the R register. Finally, control is transferred to the address in b7 which points to the start of the table of segments in core.

The track table is a series of instructions like this:

     ca  0     , hh  138
     ca  1012  , hh  97
     ca  1011  , hh  56
     ca  1010  , hh  15

The left instruction compares the address of the instruction with the number in the address part of the R register. If they differ, the right instruction is skipped.

If the segment is not found in the table execution continues with code that loads the segment into core.

So if the address part of R contains 1011, control is passed to the right hand instruction in word 56. The segment 1011 starts at word 57. The right hand instruction in cell 56 contains a hs call of a10:

a10: d7=a10-2[used by a5]  ;
b9:  gs  b11   , ps  s _-_1  ; track is in: curr place:= s;
a11:         [rel addr]    ; set exit addr: s:= s + rel addr;
b10: arn_-_4_9_2[priority] D 1 ; set priority: priority:= Raddr:= priority + 1;
c65:                       ;
b11: ga _-_1[curr place] VNO ;   _i_f priority = 512 _t_h_e_n
     arn b2    , hv  a6    ;     _b_e_g_i_n Raddr:= top place; _g_o_t_o clean up _e_n_d;
     arn b21               ;   R:= save R;
     hh  s1            LRC ;  _i_f go _t_h_e_n _g_o_ _t_o _i_f LRB _t_h_e_n right s1
     hv  s1            LRA ;                   _e_l_s_e left s1;

Note that b9 contains the offset into the segment where execution should start. The code increases the priority of the segment with one (cleans up the priority list if 512 is reached) and transfers control to the left or right instruction depending on the flag RB which is set if the hs c1 was f-marked.

If the core store contains many segments the list of ca's will be long, and it takes time to go through these instructions.

Implementing the PC instruction

In order to speed up the process of going through the ca instructions I've tried to implement a new instruction on the GIER machine, the pc instruction.

This instruction should be called with the address of the first ca instruction in the track table. The pc instruction fetches the words pointed by the address one by one and compares the address with the address in R. If a match is found, control is transferred to the right hand instruction in the address found in the right hand side of the cell.

The list is exhausted when a word is loaded that does not have bit 40 set, as the first instruction after the ca table is a full word instruction.

I've used the microcode of the ca instruction as a template, you can find this on page 120 in Teknisk beskrivelse af Gier, bind 1.

The microcode for the pc instruction looks like this:

MA Gs Gm FL Betingelse nr. Hop Kommentar
1 AC00-39.H40-41
H00-39 2 H:=R;
2 AD1 OT 3 r1:=r2;
3 OT AD1 4 loop: r2:=r1;
4 step læs0-41 5 LI:=ferrit[r2];
5 step 6
6 H00-39
¬LI40 1 if -,LI[40] then begin
R:=H; h:=false; Mode1 end;
6 LI0-9
LI40 7 if LI[40] then O:=LI[0-9];
7 Add00-19 AD2 8 s2:=MD+H;
8 AD2
9 R:=s2;
9 Tæl i OT AC≠0 3 if R≠0 then begin r1:=r1+1; goto loop end;
9 LI10-19 TD AC=0 10 if R=0 then TD:=LI[10-19];
10 TD skrå
Mode 1
1 r1:=TD; h:=true; Mode1;

I've tried to follow the conventions for microcode listed on page 50 in Teknisk beskrivelse af Gier, bind 1.

The GIER ALGOL 4 compiler needs to be patched in order to use the new instruction. The tape T1, L1 (26) 20.07.70.flx is modified like this:

flx2a <"T1, L1 (26) 20.07.70.flx" | tac | sed '0,/_s/{//d;}' | tac | \
       sed -e '/_i redefine/{ N; s/_s/e14=117,e27=1/g }' | \
       sed -e 's/hv(b7)/pc(b7)/g' | \
       sed -e 's/c5:hv_d_2_-_3,arnb5/c5:pc_d_2_-_3,arnb5/g' >BUILDga4drumbufturbo.asc

The instruction hv(b7) occurs twice.

Testing Turbo GIER ALGOL 4

I've tried a series of GIER ALGOL 4 programs on four different GIER machines, two with buffer and two without buffer, and two with a classic GIER ALGOL 4 compiler and two with a turbo GIER ALGOL 4 compiler.

The results are listed in the table below. The speedup, in percent, is listed as well. You can click on the .asc file to see a listing of the program as a PDF file. Note that some of the lines are truncated at the margin.

Program Function Time, sec.
GA4 Classic
Time, sec.
GA4 Turbo
% Notes
Calculation of large numbers
Written by Jørgen Kjær with
improvements by Thorkil Naur
sqrt(r) 5331.8 4923.3 7.7 380 decimals, buffer GIER
no index check
sqrt2(r) 1247.6 1077.9 13.6
sqrt3(r) 389.2 365.0 6.2
sqrt(B) 388.1 363.9 6.2
sqrt(r) 8195.6 7892.0 3.7 380 decimals, no buffer GIER
no index check
sqrt2(r) 2147.2 2008.1 6.5
sqrt3(r) 379.9 364.9 3.9
sqrt(B) 377.6 361.5 4.3
Solve Geocache GC84BA2
9439.92 8811.91 6.7 No buffer GIER
9583.25 8912.39 7.0 Buffer GIER
Matrix inversion
Topsøe INVERT2 algorithm
603.71 599.31 0.7 No index check
1407.11 1250.24 11.1 Index check
No buffer
36.59 36.05 1.5 No index check
67.66 64.94 4.0 Index check
inv1.asc run on GIER ALGOL III
N=21 GA3: 60.49
Solve Geocache GC7J6KQ
12949.57 12120.39 6.4 No buffer
3397.18 3077.07 9.4 Buffer
Solve Geocache GC7J6KQ
2402.93 2295.14 4.5 No buffer
2427.87 2274.49 6.3 Buffer
nqueen18.asc N=12 13968.89 12368.89 11.5 No buffer
14069.38 12469.37 11.4 Buffer
Linear equations. Topsøe LEQ1 algorithm
("Det Gauss" without determinant calculation).
end for added on central loops.
N=20 14.112 14.004 0.8 No buffer
N=20 12.920 12.783 1.1 Buffer
N=60 262.368 261.448 0.4 Buffer
Like pc1.asc, but without first end for
N=60 408.109 375.170 8.1 Buffer
N=10000 184.573 180.017 2.5 No buffer
184.588 180.031 2.5 Buffer
N=30000 52.809 50.206 4.9 No buffer
52.822 50.218 4.9 Buffer
Project Euler problem 61
No index check 7938.42 7036.65 11.4 No buffer
6655.42 6091.44 8.5 Buffer
Index check 7893.65 7045.71 10.7 No buffer
8032.21 7153.58 10.9 Buffer
Project Euler problem 112
No end for 30380.22 28390.72 6.5 No buffer
30569.63 28580.12 6.5 Buffer
With end for 25329.12 24158.28 4.6 No buffer
25518.52 24347.68 4.6 Buffer
Project Euler problem 12
38157.34 38153.31 0.01 No buffer
38154.11 38150.08 0.01 Buffer
N a:=b in loop
No buffer
N=640 659.797 612.932 7.1
N=680 689.896 642.808 6.8
N=720 729.221 679.751 6.8
N=760 5649.694 5649.645 0.001
N=800 5908.468 5895.573 0.2
Pentomino from DR program
428386 408163 4.7 No buffer
280782 251104 10.6 Buffer
sudoku7.asc tracks transferred:
24794.2 24784.9 0.04 No buffer
tracks transferred:
3407.5 3097.7 9.1 Buffer
SLIP version of sudoku7.asc