Rational/R1000s400/Logbook

Fra DDHFwiki
Spring til navigation Spring til søgning



Emulator status

Git version Seconds simulated Duration Rate Commit date
b7e744d3 1499.5s 16.46h see note 2024-12-18
1f0eb2bd 1244.0s 21.00h 60.8 2024-11-29
aba2f750 1217.2s 35.73h 106 2024-10-28
939c0651 1212.1s 57.17h 170 2024-09-27
f600c9af 1213.0s 71.53h 212 2024-08-31
3c13f94e 1210.3s 86.34h 257 2024-07-21
955c0228 1214.4s 106.7h 316 2024-06-29

Note: The b7e744d3 run continued, simulating 250 seconds of idle, so it's ratio cannot be compared to the rest.

2025-12-31 7% greener R1000

The emulator runs only 2.32% of the hardware speed, but uses only 2.17% as much electricity.

QED: The emulator is 7% more energy efficient than the hardware.

Over the midwinter break, the emulator has been put through a couple of longer tests than the usual "does it boot?" test.

First we let it boot the original disk images from PAM's machine which took 33 hours.

Second we did it again, but this time with writable disk images, and then we shut it down after enabling:

   package Operator is:
   […]
   procedure Archive_On_Shutdown (On : Boolean := True);
   function Get_Archive_On_Shutdown return Boolean;
   -- Archive_On_Shutdown causes the next shutdown to store internal
   -- state in 'archive' form, allowing upgrades and conversion of
   -- internal data structures.  It typically takes several hours to
   -- complete a shutdown or restart with archive conversions.

That run lasted 62 hours, and ended with the system shutting itself down as expected.

We have started analyzing those disk-images, to see if we can learn more about the internal state from the 'archive' form of the metadata, and some clues have emerged.

2024-12-01 Going downhill!

If you look at the table above this, you will notice that the software emulation is getting much faster faster.

We are now in "minute for second" territory, where the software emulation takes a minute to simulate a second of machine time, we started out in the "hour for second" wilderness.

Schematics are reduced to essentially a single SystemC component for each board, and we are in the "where is this signal needed and by when in the clock cycle is it needed?" which allows us to optimize out man updates of signals and avoiding relatively complex code execution.

One example is the rotator part of the FIU board, conceptually it is quite simple, but the RUN_UDIAG microcode is written to test how it is implemented, so as long as we want to use RUN_UDIAG, we have to implement it that, even though it is fairly complex to do so in software.

But we only need to execute that rotator code when something else needs the output of the rotator, and optimizing for that just cut 15 minutes of our 21 hour baseline.

That is a respectable 1.2% speedup, and compound interest being what it is, we only need to find another 206 of those, before we will have speed parity with a 44 year old computer.

2024-10-23 Everything is simpler in software

We have reached a bit of a milestone with the emulator:

When we started there were almost 800 components on 46 pages of MEM32 schematics, and that was even after we cheated and left out all but one set of the inverter drivers for the signals to the DRAM banks.

Now we are down to a single component, which does all the actual work plus the diagnostic processor which does nothing but exist.

To add insult to injury, the source-code for that single component is only 532 lines in our Python/C++ chimera language.

The TYP and VAL boards have had almost all functionality merged into a single component as well, 758 and 545 lines of code respectively.

IOC, FIU and SEQ are putting up more of a fight.

One of the runs in progress will probably come in around 36 hours to login.

2024-10-01 faster and faster

The changes we make in the emulator at present are subtractive, two component activations per micro cycle here, another one there and so on.

But because the total number of activations are now down to around 400 per micro cycle, each one saved matters more and more in terms of overall performance.

The state of play right now is:

act/µc % board
14.000 3.47 emu
34.944 8.67 typ
40.956 10.16 val
52.599 13.05 ioc
59.520 14.77 mem32
84.113 20.87 fiu
116.871 29.00 seq
403.000 100.00 Total

Still ways to go…

2024-08-31 below 72hours

The latest integration run, booted in 71½ hours - less than three days and 33% faster than two months ago.

2024-08-11 Making progress

As can be seen from the table above, we're making some serious progress with the emulator performance now.

The current effort is ditching as much of the diagnostics as possible, retaining only what is necessary to run FRU_P2UCODE, FRU_P3UCODE, RUN_UDIAG and to boot the environment.

With diagnostics out of the way, a lot of "condensation" becomes possible.

We have condensed each board onto a single schematic sheet, starting around a square meter, but rapidly shrinking.

On the test-bed right now is a branch where the entire VAL datapath has been condensed into a single "chip":

As a result, the VAL board now uses 30% fewer SystemC activations than the TYP board.

The next step is to integrate the RF address generator, which currently calculates the addresses more often than necessary, and multiplexes the C address onto the A+B address busses during H2.

But first, we optimized how the C-address was calculated, and the resulting C++ code looks a lot like it implements the microcode encoding as described on the front page of the VAL schematic, but there are a couple of twists relating to the CSA (Control Stack Accelerator), which were not on the VAL schematics.

We also have similar improvements to MEM32 and SEQ in the pipeline, they will all hit the table above, after they complete their test-runs.

2024-07-13 The Scavenger

The Memory-Monitor, which lives on the FIU board, is the control-logic common to all the memory boards in the system.

A part of the Memory-Monitor is a circuit called "Scavenger" which is described in the 1982 document Functional Specification of the Memory Monitor on pdf page 24.

The 1984 document R1000 Hardware on pdf page 70 says:

The memory monitor also has a scavenger RAM (page 61). This was intended to provide a garbage collection scheme for the memory manager. However, that scheme was abandoned and the scavenger RAM is no longer used.

We have just experimentally found out, that "is no longer used" is not the same as "can be removed".

A emulator run without the scavenger runs fine until the Virtual Memory has been started, and then it keels over with:

  23:15:55 !!! EEDB Assert_Failure Subsystem_Map Inconsistency found in
               Reset_Kernel_Node
  23:15:55 *** EEDB.Init Format_Error Exception = <Exception: Unit = 3586580, Ord
               = 1>, from PC = #163C13, #3AF
           *** Calling task (16#C985404#) will be stopped in wait service
  23:15:55 !!! Internal_diagnostic (trace) ** Start of trace **
  23:15:55 !!! Internal_diagnostic (trace) Task: #C987804
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  1,  Pc = #92013, #64D
  23:15:55 !!! Internal_diagnostic (trace)     in rendezvous with #C985404
  23:15:55 !!! Internal_diagnostic (trace) Task: #C985404
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  1,  Pc = #163C13, #151B
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  2,  Pc = #163C13, #1B52
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  3,  Pc = #163C13, #19B1
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  4,  Pc = #92813, #C8
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  5,  Pc = #92813, #D6
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  6,  Pc = #92413, #E4
  23:15:55 !!! Internal_diagnostic (trace)     Frame:  7,  Pc = #92013, #CC
  23:15:55 !!! Internal_diagnostic (trace) ** End of trace **

But other than that, as can be seen in the table above, the emulator is getting faster. We have pulled out all the parity error checks, but not the ECC checks, and we have started to eliminate the diagnostic circuitry, retaining only what is necessary to run FRU::P2UCODE, FRU::P3UCODE, RUN_UDIAG and the Environment.

2024-07-04 Improvement on second IOC board - back to state as of 2023-11-16

We found the problem that gave us a set-back on the second IOC board. While replacing one of the chips, the IO.DCLK line was damaged so it no longer provided a signal to 5 latches. We have soldered an extra wire on to fix this, and the board now boots like it did back in 2023-11-16.

So this IOC now passes self test, again, but fails to communicate properly SCSI over the UniBus interface.

At least the set-back has been reset now...

2024-06-15 End of no-parity experiment

For now we have come to the end of the experiment where we got rid of all parity and ECC checking.

It speeds the emulator up by 20-ish percent, which is nothing to sneeze at, but we have concluded that we are not yet ready to continue without the diagnostic subsystem.

2024-06-13 n_running_r1000++, and perhaps a lead on IOC2 trouble

Finally got time to take a look at the R1000. With one of our previously hacked BIOS the R1000 complained about "G6".

So after a little work with a side-cutter and a soldering iron, a new SRAM-chip on G6 was in place and the IOC was yet again happy about Life, the Universe and Everything :-)


Another try to identify the illusive problem on IOC board 2 might have given something:

The trigger is when BYTE0 is asserted causing a BERR and in turn the M68K to trap. The trace hints that SRAM G47 (DQ1 in bank 0) has become slow. DQ1 goes high just after CLK.2X clocks and that causes the parity on BYTE0 to fail. Several other traces shows the same pattern, DQ1 is much slower than the other lines. Next time we will replace G47, fingers crossed!

2024-06-09 Parity elimination

The R1000 computer conservatively stores a parity-bit with any byte stored in a RAM chip or passed over the backplane or foreplane busses, and as a general rule the parity bits are passed through rather than recalculated. The DRAM storage on the MEM32 boards use a 9-bit ECC code instead.

All in all, that takes a fair bit of logic in the hardware and in software it gets even slightly worse because calculating the parity is a slightly expensive operation.

We do not expect the simulated RAM chips or busses to have bit-errors, so all this is surplus to requirements for the emulator, but removing it to speed things up, comes at the cost of many tests failing, because the validate the parity-circuit.

For the last couple of weeks we have worked to eliminate all the parity-checking in a separate branch, having as goal that run_udiag and booting the enviroment are the only two things that must work.

We had to modify that goal slightly, because run_udiag actually tests the ECC circuitry near the end, so instead of ending in success, it now ends with code 29F2 - ECC_EVENT_NOT_TAKEN.

With all that said, the run which just ended now took only 106 hours, 315 times slower than hardware.

The run we started instead seems to be a percent faster or so.

No news on the hardware front, everybody is busy.

2024-05-07 Gentlemen: Start your FPGAs

We have branched Release2 of the github project to celebrate the fact, that we have simulated the HW-true schematics all the way to login.

That took 59 days and 19 hours on the MacBook M2.

The point of this branch is to provide a well documented launch-pad, should anybody want to create a FPGA based Rational R1000.

2024-04-26 n_running_r1000--

When we tried to boot the working R1000 yesterday, we got IOC RAM errors :-(

2024-04-15 Spring has sprung

And that means a boatload of distractions, so activity is a bit low right now.

On the MacBook M2 the hardware schematic simulation is chugging along, 4000 times lower than hardware (1250 Hz instead of 5 MHz), but it has gotten to CMVC.11.8.2D and in another three weeks it should be at the login prompt.

We are a bit stuck on the IOC#2 board, the best current hypothesis is that the different SRAM speeds causes the byte parity lines to flutter at an inconvenient time, but we do not have enough 25ns chips to verify that theory, and DIP versions of that SRAM are hard to come by.

Given the prospect that more SRAM chips are guaranteed to die on us, and the fact that surface mount versions of CY7C187 are still obtainable, making a small adapter board, for instance with three chips per board, looks more and more attractive.

2024-03-27 Integrated simulated MEM32

We have started to implement the MEM32 board's core functionality as a single integrated SystemC/C++ class.

The first stage is to implement the tag-RAM logic so the contents is maintained identical to the working "discrete" simulation of the MEM32 board.

So far we are at around 350 lines of code, with a lot of duplication in order to have the same precise memory layout as the discrete model, so we can compare them with memcmp(3).

The Functional Specification of the Memory Monitor has been of good help in this, and it has mostly been a matter of waiting for the test-runs to crash, and then look at the end of the 100+GB trace-file to find out why.

The second stage will be doing the same for the DRAM arrays, we do not anticipate any challenges here, once the tag-logic is ironed out, this should follow trivially.

The third stage will be to migrate the MEM32's output signals from the discrete model to the integrated model.

We do not have any documentation which explains the meaning and timing of the control signals, so some Prototyping™ (The methodology formerly known as "Trial&Error™") will be involved.

The fourth and final stage will be to eliminate the discrete simulation to reap the "integration-dividend".

This will mostly be about simulating enough of the Diagnostic Archipelago to keep the programs on the IOP happy.

2024-03-19 and more debugging

The following measurements have been done with cut termination resistor pin of A22 (no line terminator).

Last session lagged measurements of some of the SRAM pins. VCC between H2 and H19 are 4.91-4.95 with ripple at average 140mV, but increases up to 450mV every 12.8us (78.1kHz) for about 100ns.

Other observations:

Pin 10 /WE switches at 10MHz continuously (/CE not asserted) - The superimposed signals on A22 is at the same frequency, and it is not unlikely that the lines of A22 and /WE is going close by each other.


Q-output of H40 (DQ8) has a slightly different shape than on any other of the 72 SRAMs. Interesting enough, the noise stick with H40 even when exchanging the SRAM with on from another position.

DQ8

DQ!8


SRAM output while not driven is at 1.8-1.9V.

2024-03-14 and debugging

Assuming the IOC2-problems are rooted in SRAMs causing parity-errors at times where they are not meant to be tested, we took on to systematically measure the IOC2 SRAMs to find out if they are working within specs, starting with the address lines. Address-wise the SRAMs are organized in two banks of 256KB, a low and a high bank, 512KB in total. Physically the SRAMs are grouped in 4 blocks. Each block with low and high bank of the same Byte - 16+2 SRAMs in each block sharing the same address lines (separate Chip selects):

Table 1
Physical SRAM layout (Bank / Byte)
row/column H G
41-48 H/1 L/0
33-40 L/1 H/0
31-32 Parity 1 Parity 0
18-19 Parity 3 Parity 2
10-17 H/3 L/2
2-9 L/3 H/2

First set of measurements were done on address lines on the physical ends of each block.

Table 2
Pin SRAM address line noise, Pk-Pk (mV) Line
H2 H19 G2 G19 H31 H48 G31 G48
1 170 220 180 170 240 310 185 250 A14
2 140 210 130 210 175 135 155 205 A15
3 165 210 85 135 140 130 170 200 A16
4 200 200 170 180 145 150 175 115 A17
5 140 165 125 170 175 180 210 130 A18
6 175 145 110 200 175 150 165 140 A19
7 105 210 145 200 150 115 205 130 A20
8 190 135 200 140 165 115 210 190 A21
14 775 365 570 275 180 445 195 275 A22
15 230 340 130 95 160 185 205 190 A23
16 125 150 130 150 170 120 175 165 A24
17 150 150 115 135 240 232 230 150 A25
18 130 200 90 160 150 145 200 170 A26
19 195 180 160 155 165 170 195 205 A27
20 135 195 115 135 140 130 245 185 A28
21 130 190 110 150 135 130 175 195 A29
Bank 0 Bank 1 Bank 2 Bank 3

(The measurements should probably have included all pins and not just the address lines...)

In an attempt to locate the source of the noise on A14 BYTE3, a second set of measurements were done. The below values are delta 10mV based on a "manual" average of what was measured on the scope:

Table 3
Noise on pin 14
Chip Pk-Pk (mV) Is new
H2 770 *
H3 760
H4 730
H5 720 *
H6 700 *
H7 670
H8 660 *
H9 630
H10 610 *
H11 580 *
H12 540
H13 520 *
H14 480
H15 450
H16 420 *
H17 400 *
H18 370 *
H19 350 *

Clearly, least noise at the drivers at the center and most noise toward the termination resistors at the edge of the board. Same pattern on the other banks, although not as pronounced (Table 2).

The termination resistors are DIL16 chips R220/330 serving multiple lines. A line next to the A22 is a CLK.2X line working at 10MHz. Zooming in on the A12 noise, it is turns out that the noise is actually at CLK.2X frequency.

This lead to the last test done: cutting the termination resistor pin of A22 to see if the resistor is somehow responsible for the noise - It is not; the chip-side becomes nice and steady, while the board side (A22) noise remains, although at a lower mean level. The line is actively driven low by the driver.

Earlier, while exchanging one of the SRAMs our de-solder machine spewed out some solder, which we hoped to have cleaned up, so a close-in re-inspection was done, without finding any remains.

We tried to remove all the new SRAMs on BYTE3, but the noise remains.

  1. Solder residue still on the board after the spillage? - Table 2 does not seem to agree (identical patterns on other banks).
  2. Solder between A22 (pin 14) and a neighbor pin on one of the exchanged SRAMs? - Not likely, pin 13 is data in which is stable and pin 15 is A23 which is much less noisy than A22.
  3. Defect termination-resistor chip? - The pin-cut-test does not support this.
  4. Defect driver? - Wouldn't the noise amplitude be higher closer toward the driver?
  5. New SRAMs radiates the noise? - No, we tried to remove these, same noise.
  6. Old SRAMs radiate the noise? - Table 3 does not support this.

But then what???

2024-03-09 Still debugging

We're still debugging IOC2 which have done us the favour of failing much more predictably.

Current hypothesis is that when the M68K CPU writes to registers in UNIBUS space, the IOC is designed such that the parity information put on the UNIBUS comes from the RAM on the IOC board, even though that RAM takes no part in the UNIBUS transaction.

During the EEPROM based selftest, this parity information is read back and causes a bus-error signal to get raised, which the EEPROM has not yet prepared for.

One possible explantion for why we see this, is that we have replaced the defective 64Kx1 SRAM chips with "NOS" chips which are slightly slower (35ns), so the parity-error signal may take longer to stabilize.

Work continues on the Emulation, where we are both speeding up the "Optimized" schematics and working to get the HW-true schematics to pass all tests and boot again.

2024-02-19 T-12y and counting

If we can speed up the emulator by one percent every week, it will run as fast as the actual hardware in a little over 12 years.

And since 1%/week does seem to be our rate of improvement right now, it is probably time for a new strategy.

Until now the optimizations have been conditioned on keeping everything working, including all the diagnostic facilities, but this is increasingly becoming a headache for us, because the diagnostic facilities, sensibly, test the implementing circuits rather than the desired architecture, and they often test the implementing circuits in ways the desired architecture does not use them.

To give a concrete example:

The architectural view of a micro instruction on the TYP board may be: "Take whatever is on the TYP bus, and store it at location X in the register file"

The implementing circuitry needs to put the TYP bus on the A or B bus to the ALU, tell the ALU to pass that bus through to it's output where it is put on the ALU bus, from there to the C bus, and then setup the right address for the register file and strobe the Write Enable signal to the actual register file RAM devices.

But in the implementing circuitry, the C bus does not go to the register file RAM devices, only the A and B bus does, so there is an extra step of gating the C bus to both the A and B busses, because both sides of the register file needs to be written.

Because the diagnostic tests test the implementing circuitry, they test that the TYP bus can be placed on the A and B busses, that the C bus can be placed on the A and B busses, and that the ALU can pass a value through unharmed.

If we try optimize things by using true dual-port register file RAM devices, which is what the architecture calls for and get rid of the extra bus-switch-yard, which the reduction of the architecture to implementing circuitry brought into existence, the diagnostics will fail.

So it may be time for us to give up on the low-level diagnostics, and rely only on the micro-code based diagnostics going forward.

We will try out this idea on the MEM32 board, because it is, all things considered, very simple and it has no microcode.

2024-02-04 Lookin' good

The museum was open to the public today and we used the real Facit terminal to log into the emulator:

Seems to work, as expected, but it is slooooow…

2024-01-27 And we have liftoff!

Ladies & Gentlemen!

The R1000/s400 software emulator works:

  09:19:31 +++ Operator Enable_Terminal  244
  09:19:31 +++ Operator Enable_Terminal  245
  09:19:31 +++ Operator Enable_Terminal  246
  09:19:31 +++ Operator Enable_Terminal  247
  09:19:31 +++ Operator Enable_Terminal  248
  09:19:32 +++ Operator Enable_Terminal  249
  09:19:32 !!! Machine_Initialization_Start Exception_While_Starting
               TMS_Elaborate in context !Machine.Initialization.Rational
  
  ====>> Elaborator Database <<====
  EEDB:
           
  ====>> Console Command Interpreter (System Job 223) <<====
  username: pam
  password: 
  session: s_1
  99/01/22 09:20:04 --- pam.s_1 logging in.
  
  ====>> Ci.Interpret (PAM.S_1 Job 221) <<====
  command: what.users
  
  ====>> What.Users (PAM.S_1 Job 219) <<====
  User Status on January 22, 1999 at 9:20:13 AM
  
       User       Line  Job   S      Time     Job Name                           &
  
  ==============  ====  ===  ====  =========  ===================================&
  ======
                                              
  *SYSTEM            -    4  RUN   13:54.446  System
                          5  RUN   13:54.447  Daemons
                        223  IDLE     29.439  Console Command Interpreter
                        246  IDLE     53.480  Rational_Access Commands Rev 1_0_2
                        248  IDLE     57.840  Print Spooler
                        250  IDLE   1:03.681  Smooth Snapshots
                        253  IDLE     58.227  Ftp Server
                                              
  PAM                -  219  RUN       1.377  WHAT.USERS
                        221  IDLE      4.966  CI.INTERPRET("!machine.device ..., &
  ...)
                                              
  NETWORK_PUBLIC     -  225  IDLE     37.757  Archive Server
  
  ====>> Ci.Interpret (PAM.S_1 Job 221) <<====

The bad news is that it takes an hour just to login…

2024-01-27 … and warmer

  NATIVE_DEBUGGER.11.2.0D
  CROSS_DEVELOPMENT.DELTA
  INITIALIZE.11.2.4D         
          
  ====>> Environment Log <<====
  09:17:25 +++ Operator Enable_Terminal  16
  09:18:55 !!! !Machine.Initialization.Rational.Dtia
               Unable_To_Start_Dtia_Rpc_Server ERROR  Activity does not define a
               valid load view for the subsystem of !TOOLS.DTIA_RPC_MECHANISMS.
               REV11_4_SPEC.UNITS.TARGET_INTERFACE.ELABORATION'SPEC

2024-01-27 Getting warmer

  TOOLS_INTEGRATION.DELTA    
  CMVC.11.8.2D               
  DESIGN_FACILITY.DELTA
  ARCHIVE.11.4.0D            
  NATIVE_DEBUGGER.11.2.0D
  CROSS_DEVELOPMENT.DELTA
  INITIALIZE.11.2.4D         
          
  ====>> Environment Log <<====
  09:17:25 +++ Operator Enable_Terminal  16
  
  ====>> Elaborator Database <<====
  EEDB:

2024-01-26 Thinking ahead

While we wait to see where the emulation croaks next time, it is a good time to take a step back and think ahead.

The primary goal of this entire project was to produce a software emulation of the R1000/s400 computer, so that the uniqueness of the computer, and of the Rational Environment will not be lost to humanity, when the last hardware finally releases the magic smoke.

The question I want to disuss here is a bit like »Let's go see Rome in our vacation.«

There may be only one set of longitude and latitude, but there are dozens of Romes at that location, The Pope's Rome, The Cæsar's Rome, The Facists's Rome, The Independent Rome, The culinary Rome, The Opera Rome and so on. Which one, or which ones, of those Romes do you want to experience, and how long is your vacation ?

Likewise there are many R1000's to emulate: There is the R1000 running the Rational Environment, there R1000 implementing the Rational Architecture, there is the R1000 son-of-son of the original four-CPU monster s100, there is the R1000 with the amazing diagnostic abilities, there is the R1000 with type-checking in hardware, there is the R1000 with distributed microcode, there is the R1000 which is bit-oriented, there is the R1000 without linear address-space and so on. How much does the R1000 interest you ?

Because we were forced to implement the R1000 emulator from the hardware schematics, we have ended up with a software emulation which allows one to study all these aspects of the R1000, but at glacial speed: I have not tried running the "Hw" branch schematics for a long time, but last I tried they were several thousand times slower than the hardware.

If you want to understand how the ECC circuitry and hardware works you can probably live with that, if you want to demo the Rational Environment's semantic view of large Ada Projects … not so much.

What we have right now is the "Hw" branch, which supposed to be chip for chip, wire for wire, identical to the two R1000/s400 computers we have in Datamuseum.dk, except for the RESHA board. If you want to study ECC, this is the one you want.

The other branch is the "Optimized" branch, which sports weird and complex chips, some of which contains 1/3 of the FIU board and other which have weird back-door connections to the emulator to load the microcode faster. This branch currently only runs 420 times slower than the hardware, and that is neither fish nor fowl, but it may transpire that it actually works.

If so, it will be perfectly defensible to stop here, we have reached the goal we set ourselves and we can leave the performance issue as an exercise for future generations.

But that would be neither fun nor satisfying, so what do we do instead ?

One possibility would be to validate that Hw branch also works, and then convert that into VHDL or Verilog, synthesize it to a FPGA and create a 1:10 scale-model of the R1000/s400.

For many reasons, mostly lack of skill, I cannot do that. I you want to, I'll do everything I can do to help you, because I'd love to have one on my desk, but I cannot do it.

One possibility is to continue to file away on the "Optimized" branch, shaving a percent performance here, and a percent performance there. I can do that, and if I can shave 1% every week, I will reach speed parity with the hardware … in 12 years. There is a good chance that at some point I will understand the hardware well enough to get down to six compoents: IOC, FIU, SEQ, TYP, VAL & MEM32, but they will no longer teach you anything about the actual hardware of the R1000, in fact the "Optimized" branch already does not do that.

Another option is to use the observability provided by the "Optimized branch" to start implementing a "traditional" software emulation at the macro-instruction level. That will undoubtedly give us the fastest running emulator, but it will be tough going, because we need to understand both the hardware and the microcode to find out what the instructions actually do.

Going down a layer, we can implement a traditional software emulation at the micro-code level. This has the benefit that we do not need to understand the microcode instructions in relation to each other, or indeed anything about the macro instructions or Ada, we just need to execute all the actually used micro instruction words correctly. I expect such a simulator to run at least several times faster than the hardware, even on a tiny computer like a RaspBerry Pi, and to ensure the correctness of that emulation, it can be compared directly bit for bit with the Optimized branch or even the Hw branch.

In terms of preservation, in the museum sense of the word, there is no doubt that the Hw branch is the real deal.

In terms of presentation of the Rational Environment, the Hw branch is useless, and an FPGA solution will have to be "refreshed" every couple of decades, because sooner or later the magic smoke escapes.

A microcode emulation would allow presentation of the Rational Environment with high fidelity, and if written in a stable programming language, it can last forever.

So I should probably start to think about that…

/phk

2024-01-25 Finally getting somewhere

2024-01-23 Two down, two to go

We are running to test-runs in parallel, and they both passed PRETTY_PRINTER just now:

We have seen that once before, but this time we think we know why they made it.

The real test is in 24 hours, when they reach OS_COMMANDS, which we have never managed to get past.

2024-01-22 Well, that would do it...

The python script delivered its verdict overnight, and … ehh … duh ?

  scsi_d 1 WRITE_6 0a000ec003000000
  scsi_d 1 READ_6  08000ec001000000
  scsi_d 1 READ_6  08000ec101000000

The hex-strings are SCSI CDBs and the '03' in the WRITE_6 command means that three sectors are transferred.

That was news to us, until now he had not seen, or at least, not noticed, the R1000 CPU using anything but single sector transfers.

But we do support multi-sector SCSI transfers, DFS uses them when it push/pull's programs for instance, so that in itself should not be a problem.

However the trace reveals something we have not seen before:

  mailbox f 4080000f 81020033 0f000103 00050852
  mailbox d 4080000d 81020134 0f000103 00050852
  mailbox c 4080000c 81020235 0f800103 00050852
  scsi_d 1 WRITE_6 0a000ec003000000

The three mailboxes are out of order and non-contiguous: f-d-c.

Each mailbox has an attached 1Kbyte buffer, so if the mailboxes had been in c-d-e order, it would have worked fine.

In other words: Scatter/gather disk-I/O.

That, finally, explains what the "IO BUS MAP" on page 23 and 24 of the IOC schematic is good for.

Fortunately the fix looks trivial:

   -       dma_read(sd->ctl->dma_seg, sd->ctl->dma_adr, ptr, len);
   +       for (xlen = 0; xlen < len; xlen += (1<<10)) {
   +               dma_read(sd->ctl->dma_seg, sd->ctl->dma_adr + xlen, pp + xlen, 1<<10);
   +       }

and similar for write.

This is going to be an interesting and slightly tense week…

2024-01-21 Aha!

Armed with two binary traces, one from a run which stopped at PRETTY_PRINTER (6GB) and one which managed to pass that (11GB), we have now found out that the failing run reads sector 0xec1 from drive 1 and gets all zeros back, whereas the good run reads the same sector and gets non-zero data back.

This would not be inconsistent with Exception 16#20 being "Numeric Error (zero divide)" according to pdf page 69 in Guru Course day 1.

The non-zero data in the good case is not identical to the content of the disk-image when we start the emulator, so it must be data written during the run.

A python script is now trudging through the smaller trace to find such that disk write operation.

2024-01-11 IOC2 goes bad(er)

Yesterday we went through our TODO list and were thrown off at several points along the way.

For instance, it transpired that the disk images we preserved from the disks in PAM's machine did not include the three "diagnostic cylinders at the end", they had only 1655 of the disk's 1658 cylinders.

Until we ran the DISKX.M200 program this did not matter, and we never noticed, and because the emulator correctly emulated the disks as having 1658 cylinders, it had not problem with DISKX.

But we configured the SCSI2SD disk emulator from the size of the images, so it presented only 1655 cylinders in the "mode pages" and when we tried to run DISKX on the hardware, the kernel panic'ed when it saw I/O requests to cylinders past the end of the disk, as it should have.

The visual inspection found no relevant differences between the two board, and the handwritten correction to IOFF0A on the schematic must have predated the s400, because it was incorporated in the PCB layout.

Next we soldered connections to some of the interesting chips to the solder-side of the IOC, removed all the other board from the machine, to make space to get to the connections, and tried to make sense of things.

But by now the board did not even complete the IOC EEPROM self-test, it failed with the RED led on the front-panel indicating that the M68K had halted itself. Eventually we could show that the two of the byte parity signals from the IOC RAM were noisy, even in this state where nothing were accessing them, and we decided to call it a night. If we can reproduce that noise with the board on the workbench, it should be a quite easy to track down where the noise comes from. It could be the 74F280 parity generators, but it is far more likely to be another couple of 64Kx1 SRAM chips heading for the eWaste bin.

The emulator now reaches PRETTY_PRINTER in 38h43m.

2024-01-09 IOC2 in the memory wringer

Happy New Year!

This last sunday the museum was open, and we used the chance to give the SRAM on IOC2 a really good workout with the M68K CPU.

Using the excellent "vasm" by Volker Barthelmann et al, we have created an adhoc tool-chain which allows us to write test-code in M68K assembler and get commands out, ready to be pasted into the IOC's builtin low-level debugger.

The low-level debugger's commands are described on the first page of The IOC Schematics and not too bad to use manually, once you memorize them, and the addresses you are working with, but for entering even trivial short programs, cut and paste directly from the tool-chain is much more convenient.

But efter beating the RAM up for several ours in various creative ways, we found no reason to think the M68K is able to trigger the parity-related or -adjecent situation which causes huge disk-writes to stall.

We also tried another interesting experiment, we overwrote PROGRAM_2.M200 with DISKX.M200, this is pretty trivial to do since the DFS filesystem lays files out in contiguous sectors, and the RPi we use as console server has a "backdoor" USB connection to the SCSI2SD disk emulator board. With sector numbers where the file start from the emulators "dfs" CLI command, good old dd(1) will do the job.

DISKX.M200 is a "DFS based disk exerciser", described in not too much detail on page 27 of the Command Interfaces manual.

By putting DISKX in PROGRAM_2's place, we can boot the system and via the EEPROM prompts load DISKX without ever doing a DFS "PUSH" and thus without doing any of the large disk writes which hangs.

This overwrite trick is very handy, and we checked that it also works for CLI.M200, and for that matter, we can patch KERNEL.M200, FS.M200 or even write our own programs to run.

But back to DISKX: When run this way in the emulator, it works fine, run on IOC2 it fails hard and fast:

  Initializing M400S I/O Processor Kernel 4_2_18
  Disk  0 is ONLINE and WRITE ENABLED
  Disk  1 is ONLINE and WRITE ENABLED
  IOP Kernel is initialized
  Exercize unit 0 [Y] ?
  Exercize unit 1 [Y] ?
  Disk unit => 0, using cylinders [1648..1654]
  Disk unit => 1, using cylinders [1648..1654]
  DFS based
  I/O Processor Kernel Crash: error 066D (hex) at PC=00004B82
  Trapped into debugger.
  RD0 00000676  RD1 00000000  RD2 00000001  RD3 00000001
  RD4 00000444  RD5 00000000  RD6 80001900  RD7 00000001
  RA0 0000E820  RA1 0003FF80  RA2 0000E83C  RA3 00020578
  RA4 00026350  RA5 0003FF4E  RA6 0003FF8A  ISP 0000FAB0
   PC 0000A158  USP 0003FF4E  ISP 0000FAB0  MSP FDFF742D  SR 2704
  VBR 00000000 ICCR 00000001 ICAR DB359DB2 XSFC 7 XDFC 1

Crash error 066D is listed on pdf page 4 of IOC Schematics as "unimplemented" but from our disassembly of KERNEL_0.M200, it looks like a catch-all "Somethings horribly wrong with disk-I/O".

Unfortunately JMP instructions are used to get to the error-emitting code, so no trace is left on the stack of which particular of the about a dozen different conditions caused it.

Patching those JMP's to JSR in KERNEL_0.M200 and trying again is now on our TODO list.

The fact that DISKX.M200 fails almost instantly, despite doing only single-sector transfers, hint that the problem, whatever it is, has to do with the intensity of DMA transfers more than the actual length of the individual transfers, which might indicate some kind of thermal issue.

Based on nothing but Mike Druke remembering that Signetics 74F373 caused some grief back in the days, we have previously replaced one such which is very prominent in the UNIBUS::PB path (IOREG12 @ K20), but that did not magically fix the problem.

The latch signal for that 74F373 is generated in the top right corner of IOC schematic page 25 (= pdf page 35), by IOFF0A, a 74F74, and the schematics have a handwritten correction which inverts that signal.

There is quite a gap in ECO levels between the working and non-working IOC boards, so we are also going to do a very detailed visual audit, to see if we can spot any differences, and in particular look for this one, since it is goes to the heart of the trouble we see.

2023…2012

  • 2014-2018 - The project got stuck for lack of a sufficiently beefy 5V power-supply, and then phk disappeared while he built a new house.

Many thanks to

  • Erlo Haugen
  • Grady Booch
  • Grek Bek
  • Pierre-Alain Muller
  • Michael Druke
  • Pascal Leroy