GIER/Turbo GIER ALGOL 4

In the article The Design of the GIER ALGOL Compiler, Part II Peter Naur writes the following:

... For these reasons we wish to base our assessment of the performance of the system on analyses of the time spent on the various language functions during the execution of realistic, practical algorithms. By this approach we will be able to detect the bottlenecks of the execution. Analyses of this kind have been made for algorithms for inverting matrices, for finding eigenvalues of symmetric matrices, and for definite integration by Simpson's rule. These analyses very definitely point to two bottlenecks: (1) subscripting, and (2) transfer of control to a segment which is already present in the core store. The relative importance of these two items varies greatly, not only with the program, but also with the manner in which the program happens to be segmented. However, as a rough average it appears that these two items together account for well over half the execution time of many realistic programs. As a further conclusion it may be stated that such programs might be speeded up by a factor of two or more if two or three special machine instructions designed to take care of these two problems were included in the machine.

I've tried to take a look at point no. 2, transfer of control to a segment already in core. Not on the real GIER machine, but on the simulator. The idea is to add an extra instruction to speed up the transfer.

Transfer of control

The segmentation of the compiled GIER ALGOL program works like this:

The compiler delivers the compiled program on the drum in tracks of 40 words.
Each track begins with the constants needed for this segment, followed by the generated instructions.
The first track is brought into core when the execution begins, and control is passed to the first instruction in the segment.
The last instruction in the segment contains a call into the Running System (RS, the part of the GIER ALGOL system resident in core). RS looks up the next segment number in a table, and if it is already in core, transfers execution to the segment. The segment is loaded from the drum if it is not present.
RS keeps track of the segments in core when the stack grows and releases the program segments as needed.
A priority system is used to release segments from core when new segments are brought in.

We take a look on what happens when execution is transferred from on segment to another segment already in core:

The last word of the segment contains a subroutine call to the label c1 in RS:

hs   c1, xxx

The right halfword of the cell contains the number of the first executable instruction in the next segment as the address. The word is f-marked (bit 41) if control is to be passed to the right halfword of the cell.

The sequence c1 in RS contains the following:

c1:  gr  b21   , arn s     ;   save R:= R;
     gt  b9            IRC ;   rel addr:= set RC (relpart(R));
     arn(b1)   D   1       ;   Raddr:= track:= track + 1; 
d19h:hv (b7) , vy[*errunit];   _g_o_t_o table search [top table]

The first word saves the R register and fetches the calling hs instruction. The second word saves the relative address in b9. The third word increases the track number in b1 and transfers it to the address part of the R register. Finally, control is transferred to the address in b7 which points to the start of the table of segments in core.

The track table is a series of instructions like this:

     ca  0     , hh  138
     ca  1012  , hh  97
     ca  1011  , hh  56
     ca  1010  , hh  15

The left instruction compares the address of the instruction with the number in the address part of the R register. If they differ, the right instruction is skipped.

If the segment is not found in the table execution continues with code that loads the segment into core.

So if the address part of R contains 1011, control is passed to the right hand instruction in word 56. The segment 1011 starts at word 57. The right hand instruction in cell 56 contains a hs call of a10:

a10: d7=a10-2[used by a5]  ;
b9:  gs  b11   , ps  s _-_1  ; track is in: curr place:= s;
a11:         [rel addr]    ; set exit addr: s:= s + rel addr;
b10: arn_-_4_9_2[priority] D 1 ; set priority: priority:= Raddr:= priority + 1;
c65:                       ;
b11: ga _-_1[curr place] VNO ;   _i_f priority = 512 _t_h_e_n
     arn b2    , hv  a6    ;     _b_e_g_i_n Raddr:= top place; _g_o_t_o clean up _e_n_d;
     arn b21               ;   R:= save R;
     hh  s1            LRC ;  _i_f go _t_h_e_n _g_o_ _t_o _i_f LRB _t_h_e_n right s1
     hv  s1            LRA ;                   _e_l_s_e left s1;

Note that b9 contains the offset into the segment where execution should start. The code increases the priority of the segment with one (cleans up the priority list if 512 is reached) and transfers control to the left or right instruction depending on the flag RB which is set if the hs c1 was f-marked.

If the core store contains many segments the list of ca's will be long, and it takes time to go through these instructions.

Implementing the PC instruction

In order to speed up the process of going through the ca instructions I've tried to implement a new instruction on the GIER machine, the pc instruction.

This instruction should be called with the address of the first ca instruction in the track table. The pc instruction fetches the words pointed by the address one by one and compares the address with the address in R. If a match is found, control is transferred to the right hand instruction in the address found in the right hand side of the cell.

The list is exhausted when a word is loaded that does not have bit 40 set, as the first instruction after the ca table is a full word instruction.

I've used the microcode of the ca instruction as a template, you can find this on page 120 in Teknisk beskrivelse af Gier, bind 1.

The microcode for the pc instruction looks like this:

MA	Gs	Gm	FL	Betingelse	Hop	Kommentar
1	AC00-39.H40-41 sub	H00-39			2	H:=R;
2	AD1	OT			3	r1:=r2;
3	OT	AD1			4	loop: r2:=r1;
4		step	læs0-41		5	LI:=ferrit[r2];
5		step			6
6	H00-39 0h Mode1	AC00-39 step		¬LI40	1	if -,LI[40] then begin R:=H; h:=false; Mode1 end;
6	LI0-9 o10-41	MD00-9 MD10-39 step		LI40	7	if LI[40] then O:=LI[0-9];
7	Add00-19	AD2			8	s2:=MD+H;
8	AD2 o10-41	AC00-9 AC10-39			9	R:=s2;
9		Tæl i OT		AC≠0	3	if R≠0 then begin r1:=r1+1; goto loop end;
9	LI10-19	TD		AC=0	10	if R=0 then TD:=LI[10-19];
10	TD skrå Mode 1 1h	OT step			1	r1:=TD; h:=true; Mode1;

I've tried to follow the conventions for microcode listed on page 50 in Teknisk beskrivelse af Gier, bind 1.

The GIER ALGOL 4 compiler needs to be patched in order to use the new instruction. The tape T1, L1 (26) 20.07.70.flx is modified like this:

flx2a <"T1, L1 (26) 20.07.70.flx" | tac | sed '0,/_s/{//d;}' | tac | \
       sed -e '/_i redefine/{ N; s/_s/e14=117,e27=1/g }' | \
       sed -e 's/hv(b7)/pc(b7)/g' | \
       sed -e 's/c5:hv_d_2_-_3,arnb5/c5:pc_d_2_-_3,arnb5/g' >BUILDga4drumbufturbo.asc

The instruction hv(b7) occurs twice.

Testing Turbo GIER ALGOL 4

I've tried a series of GIER ALGOL 4 programs on four different GIER machines, two with buffer and two without buffer, and two with a classic GIER ALGOL 4 compiler and two with a turbo GIER ALGOL 4 compiler.

The results are listed in the table below. The speedup, in percent, is listed as well. You can click on the .asc file to see a listing of the program as a PDF file. Note that some of the lines are truncated at the margin.

Program	Function	Time, sec. GA4 Classic	Time, sec. GA4 Turbo	%	Notes
demon5tn.asc Calculation of large numbers Written by Jørgen Kjær with improvements by Thorkil Naur	sqrt(r)	5331.8	4923.3	7.7	380 decimals, buffer GIER no index check
	sqrt2(r)	1247.6	1077.9	13.6
	sqrt3(r)	389.2	365.0	6.2
	sqrt(B)	388.1	363.9	6.2
	sqrt(r)	8195.6	7892.0	3.7	380 decimals, no buffer GIER no index check
	sqrt2(r)	2147.2	2008.1	6.5
	sqrt3(r)	379.9	364.9	3.9
	sqrt(B)	377.6	361.5	4.3
enigma1.asc Solve Geocache GC84BA2		9439.92	8811.91	6.7	No buffer GIER
enigma1.asc Solve Geocache GC84BA2		9583.25	8912.39	7.0	Buffer GIER
inv1.asc Matrix inversion Topsøe INVERT2 algorithm	N=61 Buffer	603.71	599.31	0.7	No index check
	N=61 Buffer	1407.11	1250.24	11.1	Index check
	N=21 No buffer	36.59	36.05	1.5	No index check
	N=21 No buffer	67.66	64.94	4.0	Index check
inv1ga3.asc inv1.asc run on GIER ALGOL III	N=21	GA3: 60.49
lorenz2.asc Solve Geocache GC7J6KQ		12949.57	12120.39	6.4	No buffer
lorenz2.asc Solve Geocache GC7J6KQ		3397.18	3077.07	9.4	Buffer
lorenz3.asc Solve Geocache GC7J6KQ		2402.93	2295.14	4.5	No buffer
lorenz3.asc Solve Geocache GC7J6KQ		2427.87	2274.49	6.3	Buffer
nqueen18.asc	N=12	13968.89	12368.89	11.5	No buffer
nqueen18.asc	N=12	14069.38	12469.37	11.4	Buffer
pc1.asc Linear equations. Topsøe LEQ1 algorithm ("Det Gauss" without determinant calculation). end for added on central loops.	N=20	14.112	14.004	0.8	No buffer
	N=20	12.920	12.783	1.1	Buffer
	N=60	262.368	261.448	0.4	Buffer
pc1b.asc Like pc1.asc, but without first end for	N=60	408.109	375.170	8.1	Buffer
pc3.asc a:=sin(0.1)+exp(0.1)+sqrt(0.1)+arctan(0.1)	N=10000	184.573	180.017	2.5	No buffer
pc3.asc a:=sin(0.1)+exp(0.1)+sqrt(0.1)+arctan(0.1)	N=10000	184.588	180.031	2.5	Buffer
pc4.asc a:=sqrt(0)	N=30000	52.809	50.206	4.9	No buffer
pc4.asc a:=sqrt(0)	N=30000	52.822	50.218	4.9	Buffer
pc5.asc Project Euler problem 61	No index check	7938.42	7036.65	11.4	No buffer
	No index check	6655.42	6091.44	8.5	Buffer
	Index check	7893.65	7045.71	10.7	No buffer
	Index check	8032.21	7153.58	10.9	Buffer
pc6.asc Project Euler problem 112	No end for	30380.22	28390.72	6.5	No buffer
	No end for	30569.63	28580.12	6.5	Buffer
	With end for	25329.12	24158.28	4.6	No buffer
	With end for	25518.52	24347.68	4.6	Buffer
pc7.asc Project Euler problem 12		38157.34	38153.31	0.01	No buffer
pc7.asc Project Euler problem 12		38154.11	38150.08	0.01	Buffer
pc8.asc N a:=b in loop No buffer	N=640	659.797	612.932	7.1
	N=680	689.896	642.808	6.8
	N=720	729.221	679.751	6.8
	N=760	5649.694	5649.645	0.001
	N=800	5908.468	5895.573	0.2
pentomino4.asc Pentomino from DR program		428386	408163	4.7	No buffer
pentomino4.asc Pentomino from DR program		280782	251104	10.6	Buffer
sudoku7.asc	tracks transferred: 972876	24794.2	24784.9	0.04	No buffer
sudoku7.asc	tracks transferred: 6255	3407.5	3097.7	9.1	Buffer
sudokus7.asc SLIP version of sudoku7.asc		1122.83

GIER/Turbo GIER ALGOL 4

Transfer of control

Implementing the PC instruction

Testing Turbo GIER ALGOL 4

Navigationsmenu

Søg