Rational/R1000s400/How the R1000 boots
This page will not delve into the purely hardware aspects of bringing the system up, but to get going, we have to look at one piece of hardware:
This is the top right corner of pdf page 13 of the IOC Schematic, and very representative of the engineering of the R1000's hardware.
The first thing to notices is that while today 'A0' would always be the least significant bit of the address bus, things were not so clear cut in the 1980'ies, and in this case, it is the most significant bit, which this circuit inverts for the first sixteen clock cycles after the CPU has been reset, while the 68K20 reads the stack pointer and the address where execution is to start, from the EEPROM chip, which at all other times is mapped at address 0x8000000.
In difference from the "normal" way of doing bank-switching, where the CPU writes to some register which then changes how memory is mapped, this does the job automatically, and is not subject to subsequent software errors, and it does so with admirable economy of circuitry.
T=1µs RESET VECTOR
80000000 00 07 ff fc .LWORD 0x0007fffc 80000004 80 00 00 24 .CODE 0x80000024 […] 80000024 4e 71 NOP 80000026 4e 71 NOP 80000028 42 87 CLR.L D7 8000002a 42 86 CLR.L D6 8000002c 42 b8 f4 00 CLR.L IO_DREG5_p24 80000030 42 b8 fe 00 CLR.L IO_CPU_CONTROL_PSU_MARGIN_BREG4_p23 80000034 42 b8 f3 00 CLR.L IO_SENREG_p25 80000038 42 b8 fc 00 CLR.L IO_CONTROL_p28 8000003c 42 b8 f9 00 CLR.L IO_CLEAR_BERR_p24 80000040 42 b8 f2 00 CLR.L IO_FRONT_PANEL_LED_p27 80000044 42 b8 f5 00 CLR.L IO_FIFO_INIT_p68_p69 80000048 42 b8 fd 00 CLR.L IO_CLR_PFINT_p23 8000004c 46 fc 27 00 MOVE.W #0x2700,SR 80000050 42 80 CLR.L D0 80000052 4e 7b 00 02 MOVEC D0,CACR 80000056 2e 3c 80 00 00 00 MOVE.L #0x80000000,D7 8000005c 10 38 90 03 MOVE.B IO_UART_COMMAND,D0 80000060 20 3c 00 00 82 35 MOVE.L #0x00008235,D0 80000066 51 c8 ff fe DBF D0,0x80000066 8000006a 42 38 90 03 CLR.B IO_UART_COMMAND 8000006e 60 00 01 74 BRA CHECKSUM_EEPROM […]
No attempt is being made to verify that the CPU is sound. That has always divided people in two camps, where one side argues that if the CPU is faulty, it wont get far, if anywhere, anyway, so the test is pointless, and the other side arguing that by testing that all bits in all registers work etc, "heisenbugs" will be discovered earlier. As we shall soon see, the IOC code is a lot more sceptical about pretty much everything else than its own CPU.
Strangely enough, once running, the kernel does not seem to trust the integrity of the CPU registers while the CPU is in stop-mode, awaiting the next interrupt:
00009e74 AwaitInterrupt(): 00009e74 48 e7 ff fe MOVEM.L A6-A0+D7-D0,-(A7) 00009e78 4c f9 7f ff 00 00 a9 a4 MOVEM.L REG_SAVE_D0,D0-D7+A0-A6 00009e80 42 b8 f0 00 CLR.L IO_CLR_RUN_LED_p16 00009e84 4e 72 20 00 STOP #0x2000 00009e88 b0 b9 00 00 a9 a4 CMP.L REG_SAVE_D0,D0 00009e8e 66 76 BNE 0x9f06 00009e90 b2 b9 00 00 a9 a8 CMP.L REG_SAVE_D1,D1 00009e96 66 6e BNE 0x9f06 00009e98 b4 b9 00 00 a9 ac CMP.L REG_SAVE_D2,D2 00009e9e 66 66 BNE 0x9f06 00009ea0 b6 b9 00 00 a9 b0 CMP.L REG_SAVE_D3,D3 00009ea6 66 5e BNE 0x9f06 00009ea8 b8 b9 00 00 a9 b4 CMP.L REG_SAVE_D4,D4 00009eae 66 56 BNE 0x9f06 00009eb0 ba b9 00 00 a9 b8 CMP.L REG_SAVE_D5,D5 00009eb6 66 4e BNE 0x9f06 00009eb8 bc b9 00 00 a9 bc CMP.L REG_SAVE_D6,D6 00009ebe 66 46 BNE 0x9f06 00009ec0 be b9 00 00 a9 c0 CMP.L REG_SAVE_D7,D7 00009ec6 66 3e BNE 0x9f06 00009ec8 b1 f9 00 00 a9 c4 CMPA.L REG_SAVE_A0,A0 00009ece 66 36 BNE 0x9f06 00009ed0 b3 f9 00 00 a9 c8 CMPA.L REG_SAVE_A1,A1 00009ed6 66 2e BNE 0x9f06 00009ed8 b5 f9 00 00 a9 cc CMPA.L REG_SAVE_A2,A2 00009ede 66 26 BNE 0x9f06 00009ee0 b7 f9 00 00 a9 d0 CMPA.L REG_SAVE_A3,A3 00009ee6 66 1e BNE 0x9f06 00009ee8 b9 f9 00 00 a9 d4 CMPA.L REG_SAVE_A4,A4 00009eee 66 16 BNE 0x9f06 00009ef0 bb f9 00 00 a9 d8 CMPA.L REG_SAVE_A5,A5 00009ef6 66 0e BNE 0x9f06 00009ef8 bd f9 00 00 a9 dc CMPA.L REG_SAVE_A6,A6 00009efe 66 06 BNE 0x9f06 00009f00 4c df 7f ff MOVEM.L (A7)+,D0-D7+A0-A6 00009f04 4e 75 RTS 00009f06 9e fc 01 00 SUBA.W #0x0100,A7 00009f0a 50 fa 06 7b PANIC.W #0x67b
Otherwise this code is pretty trivial, clearing out a bunch of registers. (The "_pxx" suffix on the symbols is the register's page in the IOC schematic)
IOC EEPROM checksum check
800001e4 CHECKSUM_EEPROM: 800001e4 41 f9 80 00 00 00 LEA.L 0x80000000,A0 800001ea 76 0f MOVEQ.L #0x0f,D3 800001ec 43 f9 80 00 01 f6 LEA.L 0x800001f6,A1 800001f2 60 00 ff 78 BRA CHECKSUM_FUNC 800001f6 41 f9 80 00 20 00 LEA.L 0x80002000 800001fc 76 0e MOVEQ.L #0x0e,D3 800001fe 43 f9 80 00 02 08 LEA.L 0x80000208,A1 80000204 60 00 ff 66 BRA CHECKSUM_FUNC 80000208 41 f9 80 00 40 00 LEA.L 0x80004000 8000020e 76 0d MOVEQ.L #0x0d,D3 80000210 43 f9 80 00 02 1a LEA.L 0x8000021a,A1 80000216 60 00 ff 54 BRA CHECKSUM_FUNC 8000021a 21 fc 00 00 00 0e f2 00 MOVE.L #0x0000000e,IO_FRONT_PANEL_LED_p27
The IOC EEPROM originally was four chips, three for code and one for configuration data, and only the first three are checked. The RAM cannot be trusted to work yet, so instead of a regular function call, the 'return address' is loaded into the A1 register and the subroutine is jumped to.
The frontpanel LEDs will indicate that these test passed, and the code falls into the next test, but first we look at the checksum subroutine:
8000016c CHECKSUM_FUNC: 8000016c 74 56 MOVEQ.L #0x56,D2 8000016e 32 3c 1f f9 MOVE.W #0x1ff9,D1 80000172 d4 18 ADD.B (A0)+,D2 80000174 51 c9 ff fc DBF D1,0x80000172 80000178 4a 18 TST.B (A0)+ 8000017a 32 3c 00 04 MOVE.W #0x0004,D1 8000017e d4 18 ADD.B (A0)+,D2 80000180 51 c9 ff fc DBF D1,0x8000017e 80000184 4a 02 TST.B D2 80000186 66 02 BNE 0x8000018a 80000188 4e d1 JMP (A1)
The first thing to notice is that the checksum skips a byte, it covers only [0x0000…0x1ff9] & [0x1ffb…0x1fff]. Looking at the tail end of the "sub-eeproms":
80001ffa 00 92 11 05 21 1b 80003ffa 00 92 11 05 19 97 80005ffa 00 92 11 05 17 ff
It looks like they contain a (skipped) zero byte, 5th November 1992 ("Remember, remember..."), possibly a version number and a checksum adjustment value to make the sum zero.
If the sum is zero, the jump through A1 returns, and tests progress, if not it is time to get the message out:
8000018a 10 38 90 03 MOVE.B IO_UART_COMMAND,D0 8000018e 11 fc 00 4e 90 02 MOVE.B #0x4e,IO_UART_MODE 80000194 11 fc 00 bd 90 02 MOVE.B #0xbd,IO_UART_MODE 8000019a 11 fc 00 23 90 03 MOVE.B #0x23,IO_UART_COMMAND 800001a0 43 f9 80 00 01 c4 LEA.L 0x800001c4,A1 800001a6 20 3c 00 00 82 35 MOVE.L #0x00008235,D0 800001ac 51 c8 ff fe DBF D0,0x800001ac 800001b0 11 d9 90 00 MOVE.B (A1)+,IO_UART_DATA 800001b4 66 f0 BNE 0x800001a6 800001b6 21 c3 f2 00 MOVE.L D3,IO_FRONT_PANEL_LED_p27 800001ba 21 fc 00 00 00 01 00 0c MOVE.L #0x00000001,0xc 800001c2 60 c7 .CONST 0x60,0xc7 800001c4 0d 0a 49 4f 43 20 45 45 .TXT '\r\n' 800001cc 50 52 4f 4d 20 63 68 65 .TXT 'IOC EEPROM checksum failure\r\n'
The instruction at 0x800001c2 is probably supposed to be a jump back to 0x800018a, but it has an odd numbered relative address field, which makes it an illegal instruction. I may be intended wizardry, attempting to launch the low level debug monitor.
Personally I would have written the D3 code to the front-panel LEDs first, but again, this could also be intentional, in order to indicate problems with the console terminal before the EEPROM checksum errors.
Self tests
The next thing tested is the serial I/O chip - UART - driving the console terminal. One intesting thing about these tests is that if they fail, the code will attempt to print a message ... on the console.
80000222 4d f9 80 00 02 28 LEA.L 0x80000228,A6 80000228 41 f8 90 02 LEA.L IO_UART_MODE,A0 8000022c 70 01 MOVEQ.L #0x01,D0 8000022e 10 80 MOVE.B D0,(A0) 80000230 10 80 MOVE.B D0,(A0) 80000232 b0 10 CMP.B (A0),D0 80000234 66 00 fe 52 BNE _TEST_FAILED 80000238 b0 10 CMP.B (A0),D0 8000023a 66 00 fe 4c BNE _TEST_FAILED 8000023e d0 00 ADD.B D0,D0 80000240 66 ec BNE 0x8000022e 80000242 70 fe MOVEQ.L #-0x02,D0 80000244 10 80 MOVE.B D0,(A0) 80000246 10 80 MOVE.B D0,(A0) 80000248 b0 10 CMP.B (A0),D0 8000024a 66 00 fe 3c BNE _TEST_FAILED 8000024e b0 10 CMP.B (A0),D0 80000250 66 00 fe 36 BNE _TEST_FAILED 80000254 e3 18 ROL.B #0x1,D0 80000256 65 ec BCS 0x80000244
The first instruction loads the address of the test-code into A6, if the system manages to get to the low-level debugger, this can then tell precisely where things failed.
The UART chip (p21 of IOC schematic), is a Signetics 2661 (Search: "SCN2661 site:bitsavers.org") and it has a stacked MODE register, which means that only every other time you read or write to 0xffff9002 you get the same register. This is why the two subtests does two writes and two reads for each loop, the first subtest testing the eight bits can store a '1' and the second that they can store a '0'.
This is heavy duty testing, and in sharp contrast to the flawlessness of the vastly more complex M68K being taken for granted, and the subsequent test are similarly thorough.
I will stop quoting the assembly from here, and just quote the addresses, the code is fully disassembled in our [AutoArchaeologist]
The test at 0x8000025c configures the UART for 9600,N,8,1 and "local loopback", which means the characters transmitted are "short-circuited" to the receiver, and then all 256 different byte values are transmitted, received and checked.
I think the next test 0x800002c4 is supposed to test the "transmit hold" register automatically filling the "transmit shift register", but it may originally have been an attempt to measure the clock frequency of the dedicated UART crystal. If it was, the limits have been thrown wide open.
At this point the console is decleared healthy and the message `R1000-400 IOC SELFTEST 1.3.2` printed.
Next up testing the `512 KB memory` (0x8000038a).
This is probably the most important test, in the sense that everybody agrees that the most common hardware fault was sick RAM chips on the IOC board. Given that, it is surprising that all it tells you is:
R1000-400 IOC SELFTEST 1.3.2 512 KB memory ... * * * * * * * FAILED
Providing no hint which chips might be bad. We ended up hacking the code to get it to tell us which chips it didn't like when we repaired our IOC board.
Again, a very thorough test, but I'll leave the details as an exercise to the reader.
After testing the RAM chips themselves, the RAM parity-check circuitry is tested (0x80000568)