Rational/R1000s400/How the R1000 boots

Fra DDHFwiki
Spring til navigation Spring til søgning

This page will not delve into the purely hardware aspects of bringing the system up, but to get going, we have to look at one piece of hardware:

This is the top right corner of pdf page 13 of the IOC Schematic, and very representative of the engineering of the R1000's hardware.

The first thing to notices is that while today 'A0' would always be the least significant bit of the address bus, things were not so clear cut in the 1980'ies, and in this case, it is the most significant bit, which this circuit inverts for the first sixteen clock cycles after the CPU has been reset, while the 68K20 reads the stack pointer and the address where execution is to start, from the EEPROM chip, which at all other times is mapped at address 0x8000000.

In difference from the "normal" way of doing bank-switching, where the CPU writes to some register which then changes how memory is mapped, this does the job automatically, and is not subject to subsequent software errors, and it does so with admirable economy of circuitry.

T=1µs RESET VECTOR

   80000000 00 07 ff fc                .LWORD  0x0007fffc
   80000004 80 00 00 24                .CODE   0x80000024
   […]
   80000024 4e 71                      NOP
   80000026 4e 71                      NOP
   80000028 42 87                      CLR.L   D7
   8000002a 42 86                      CLR.L   D6
   8000002c 42 b8 f4 00                CLR.L   IO_DREG5_p24
   80000030 42 b8 fe 00                CLR.L   IO_CPU_CONTROL_PSU_MARGIN_BREG4_p23
   80000034 42 b8 f3 00                CLR.L   IO_SENREG_p25
   80000038 42 b8 fc 00                CLR.L   IO_CONTROL_p28
   8000003c 42 b8 f9 00                CLR.L   IO_CLEAR_BERR_p24
   80000040 42 b8 f2 00                CLR.L   IO_FRONT_PANEL_LED_p27
   80000044 42 b8 f5 00                CLR.L   IO_FIFO_INIT_p68_p69
   80000048 42 b8 fd 00                CLR.L   IO_CLR_PFINT_p23
   8000004c 46 fc 27 00                MOVE.W  #0x2700,SR
   80000050 42 80                      CLR.L   D0
   80000052 4e 7b 00 02                MOVEC   D0,CACR
   80000056 2e 3c 80 00 00 00          MOVE.L  #0x80000000,D7
   8000005c 10 38 90 03                MOVE.B  IO_UART_COMMAND,D0
   80000060 20 3c 00 00 82 35          MOVE.L  #0x00008235,D0
   80000066 51 c8 ff fe                DBF     D0,0x80000066
   8000006a 42 38 90 03                CLR.B   IO_UART_COMMAND
   8000006e 60 00 01 74                BRA     CHECKSUM_EEPROM
   […]

No attempt is being made to verify that the CPU is sound. That has always divided people in two camps, where one side argues that if the CPU is faulty, it wont get far, if anywhere, anyway, so the test is pointless, and the other side arguing that by testing that all bits in all registers work etc, "heisenbugs" will be discovered earlier. As we shall soon see, the IOC code is a lot more sceptical about pretty much everything else than its own CPU.

Strangely enough, once running, the kernel does not seem to trust the integrity of the CPU registers while the CPU is in stop-mode, awaiting the next interrupt:

   00009e74                         AwaitInterrupt():
   00009e74 48 e7 ff fe                  MOVEM.L A6-A0+D7-D0,-(A7)
   00009e78 4c f9 7f ff 00 00 a9 a4      MOVEM.L REG_SAVE_D0,D0-D7+A0-A6
   00009e80 42 b8 f0 00                  CLR.L   IO_CLR_RUN_LED_p16
   00009e84 4e 72 20 00                  STOP    #0x2000
   00009e88 b0 b9 00 00 a9 a4            CMP.L   REG_SAVE_D0,D0
   00009e8e 66 76                        BNE     0x9f06
   00009e90 b2 b9 00 00 a9 a8            CMP.L   REG_SAVE_D1,D1
   00009e96 66 6e                        BNE     0x9f06
   00009e98 b4 b9 00 00 a9 ac            CMP.L   REG_SAVE_D2,D2
   00009e9e 66 66                        BNE     0x9f06
   00009ea0 b6 b9 00 00 a9 b0            CMP.L   REG_SAVE_D3,D3
   00009ea6 66 5e                        BNE     0x9f06
   00009ea8 b8 b9 00 00 a9 b4            CMP.L   REG_SAVE_D4,D4
   00009eae 66 56                        BNE     0x9f06
   00009eb0 ba b9 00 00 a9 b8            CMP.L   REG_SAVE_D5,D5
   00009eb6 66 4e                        BNE     0x9f06
   00009eb8 bc b9 00 00 a9 bc            CMP.L   REG_SAVE_D6,D6
   00009ebe 66 46                        BNE     0x9f06
   00009ec0 be b9 00 00 a9 c0            CMP.L   REG_SAVE_D7,D7
   00009ec6 66 3e                        BNE     0x9f06
   00009ec8 b1 f9 00 00 a9 c4            CMPA.L  REG_SAVE_A0,A0
   00009ece 66 36                        BNE     0x9f06
   00009ed0 b3 f9 00 00 a9 c8            CMPA.L  REG_SAVE_A1,A1
   00009ed6 66 2e                        BNE     0x9f06
   00009ed8 b5 f9 00 00 a9 cc            CMPA.L  REG_SAVE_A2,A2
   00009ede 66 26                        BNE     0x9f06
   00009ee0 b7 f9 00 00 a9 d0            CMPA.L  REG_SAVE_A3,A3
   00009ee6 66 1e                        BNE     0x9f06
   00009ee8 b9 f9 00 00 a9 d4            CMPA.L  REG_SAVE_A4,A4
   00009eee 66 16                        BNE     0x9f06
   00009ef0 bb f9 00 00 a9 d8            CMPA.L  REG_SAVE_A5,A5
   00009ef6 66 0e                        BNE     0x9f06
   00009ef8 bd f9 00 00 a9 dc            CMPA.L  REG_SAVE_A6,A6
   00009efe 66 06                        BNE     0x9f06
   00009f00 4c df 7f ff                  MOVEM.L (A7)+,D0-D7+A0-A6
   00009f04 4e 75                        RTS
   00009f06 9e fc 01 00                  SUBA.W  #0x0100,A7
   00009f0a 50 fa 06 7b                  PANIC.W #0x67b


Otherwise this code is pretty trivial, clearing out a bunch of registers. (The "_pxx" suffix on the symbols is the register's page in the IOC schematic)

IOC EEPROM checksum check

  800001e4                        CHECKSUM_EEPROM:
  800001e4 41 f9 80 00 00 00              LEA.L   0x80000000,A0
  800001ea 76 0f                          MOVEQ.L #0x0f,D3
  800001ec 43 f9 80 00 01 f6              LEA.L   0x800001f6,A1
  800001f2 60 00 ff 78                    BRA     CHECKSUM_FUNC
  800001f6 41 f9 80 00 20 00              LEA.L   0x80002000
  800001fc 76 0e                          MOVEQ.L #0x0e,D3
  800001fe 43 f9 80 00 02 08              LEA.L   0x80000208,A1
  80000204 60 00 ff 66                    BRA     CHECKSUM_FUNC
  80000208 41 f9 80 00 40 00              LEA.L   0x80004000
  8000020e 76 0d                          MOVEQ.L #0x0d,D3
  80000210 43 f9 80 00 02 1a              LEA.L   0x8000021a,A1
  80000216 60 00 ff 54                    BRA     CHECKSUM_FUNC
  8000021a 21 fc 00 00 00 0e f2 00        MOVE.L  #0x0000000e,IO_FRONT_PANEL_LED_p27

The IOC EEPROM originally was four chips, three for code and one for configuration data, and only the first three are checked. The RAM cannot be trusted to work yet, so instead of a regular function call, the 'return address' is loaded into the A1 register and the subroutine is jumped to.

The frontpanel LEDs will indicate that these test passed, and the code falls into the next test, but first we look at the checksum subroutine:

   8000016c                        CHECKSUM_FUNC:
   8000016c 74 56                          MOVEQ.L #0x56,D2
   8000016e 32 3c 1f f9                    MOVE.W  #0x1ff9,D1
   80000172 d4 18                          ADD.B   (A0)+,D2
   80000174 51 c9 ff fc                    DBF     D1,0x80000172
   80000178 4a 18                          TST.B   (A0)+
   8000017a 32 3c 00 04                    MOVE.W  #0x0004,D1
   8000017e d4 18                          ADD.B   (A0)+,D2
   80000180 51 c9 ff fc                    DBF     D1,0x8000017e
   80000184 4a 02                          TST.B   D2
   80000186 66 02                          BNE     0x8000018a
   80000188 4e d1                          JMP     (A1)

The first thing to notice is that the checksum skips a byte, it covers only [0x0000…0x1ff9] & [0x1ffb…0x1fff]. Looking at the tail end of the "sub-eeproms":

   80001ffa 00 92 11 05 21 1b
   80003ffa 00 92 11 05 19 97
   80005ffa 00 92 11 05 17 ff

It looks like they contain a (skipped) zero byte, 5th November 1992 ("Remember, remember..."), possibly a version number and a checksum adjustment value to make the sum zero.

If the sum is zero, the jump through A1 returns, and tests progress, if not it is time to get the message out:

   8000018a 10 38 90 03                    MOVE.B  IO_UART_COMMAND,D0
   8000018e 11 fc 00 4e 90 02              MOVE.B  #0x4e,IO_UART_MODE
   80000194 11 fc 00 bd 90 02              MOVE.B  #0xbd,IO_UART_MODE
   8000019a 11 fc 00 23 90 03              MOVE.B  #0x23,IO_UART_COMMAND
   800001a0 43 f9 80 00 01 c4              LEA.L   0x800001c4,A1
   800001a6 20 3c 00 00 82 35              MOVE.L  #0x00008235,D0
   800001ac 51 c8 ff fe                    DBF     D0,0x800001ac
   800001b0 11 d9 90 00                    MOVE.B  (A1)+,IO_UART_DATA
   800001b4 66 f0                          BNE     0x800001a6
   800001b6 21 c3 f2 00                    MOVE.L  D3,IO_FRONT_PANEL_LED_p27
   800001ba 21 fc 00 00 00 01 00 0c        MOVE.L  #0x00000001,0xc
   800001c2 60 c7                          .CONST  0x60,0xc7
   800001c4 0d 0a 49 4f 43 20 45 45        .TXT    '\r\n'
   800001cc 50 52 4f 4d 20 63 68 65        .TXT    'IOC EEPROM checksum failure\r\n'

The instruction at 0x800001c2 is probably supposed to be a jump back to 0x800018a, but it has an odd numbered relative address field, which makes it an illegal instruction. I may be intended wizardry, attempting to launch the low level debug monitor.

Personally I would have written the D3 code to the front-panel LEDs first, but again, this could also be intentional, in order to indicate problems with the console terminal before the EEPROM checksum errors.

Self tests

The next thing tested is the serial I/O chip - UART - driving the console terminal. One intesting thing about these tests is that if they fail, the code will attempt to print a message ... on the console.

   80000222 4d f9 80 00 02 28              LEA.L   0x80000228,A6
   80000228 41 f8 90 02                    LEA.L   IO_UART_MODE,A0
   8000022c 70 01                          MOVEQ.L #0x01,D0
   8000022e 10 80                          MOVE.B  D0,(A0)
   80000230 10 80                          MOVE.B  D0,(A0)
   80000232 b0 10                          CMP.B   (A0),D0
   80000234 66 00 fe 52                    BNE     _TEST_FAILED
   80000238 b0 10                          CMP.B   (A0),D0
   8000023a 66 00 fe 4c                    BNE     _TEST_FAILED
   8000023e d0 00                          ADD.B   D0,D0
   80000240 66 ec                          BNE     0x8000022e
   
   80000242 70 fe                          MOVEQ.L #-0x02,D0
   80000244 10 80                          MOVE.B  D0,(A0)
   80000246 10 80                          MOVE.B  D0,(A0)
   80000248 b0 10                          CMP.B   (A0),D0
   8000024a 66 00 fe 3c                    BNE     _TEST_FAILED
   8000024e b0 10                          CMP.B   (A0),D0
   80000250 66 00 fe 36                    BNE     _TEST_FAILED
   80000254 e3 18                          ROL.B   #0x1,D0
   80000256 65 ec                          BCS     0x80000244

The first instruction loads the address of the test-code into A6, if the system manages to get to the low-level debugger, this can then tell precisely where things failed.

The UART chip (p21 of IOC schematic), is a Signetics 2661 (Search: "SCN2661 site:bitsavers.org") and it has a stacked MODE register, which means that only every other time you read or write to 0xffff9002 you get the same register. This is why the two subtests does two writes and two reads for each loop, the first subtest testing the eight bits can store a '1' and the second that they can store a '0'.

This is heavy duty testing, and in sharp contrast to the flawlessness of the vastly more complex M68K being taken for granted, and the subsequent test are similarly thorough.

I will stop quoting the assembly from here, and just quote the addresses, the code is fully disassembled in our [AutoArchaeologist]

The test at 0x8000025c configures the UART for 9600,N,8,1 and "local loopback", which means the characters transmitted are "short-circuited" to the receiver, and then all 256 different byte values are transmitted, received and checked.

I think the next test 0x800002c4 is supposed to test the "transmit hold" register automatically filling the "transmit shift register", but it may originally have been an attempt to measure the clock frequency of the dedicated UART crystal. If it was, the limits have been thrown wide open.

At this point the console is decleared healthy and the message `R1000-400 IOC SELFTEST 1.3.2` printed.

Next up testing the `512 KB memory` (0x8000038a).

This is probably the most important test, in the sense that everybody agrees that the most common hardware fault was sick RAM chips on the IOC board. Given that, it is surprising that all it tells you is:

   R1000-400 IOC SELFTEST 1.3.2 
      512 KB memory ... * * * * * * * FAILED

Providing no hint which chips might be bad. We ended up hacking the code to get it to tell us which chips it didn't like when we repaired our IOC board.

Again, a very thorough test, but I'll leave the details as an exercise to the reader.

After testing the RAM chips themselves, the RAM parity-check circuitry is tested (0x80000568)