Rational/R1000s400/Logbook
Emulator status
Git version | Seconds simulated | Duration | Rate | Commit date |
---|---|---|---|---|
b7e744d3 | 1499.5s | 16.46h | see note | 2024-12-18 |
1f0eb2bd | 1244.0s | 21.00h | 60.8 | 2024-11-29 |
aba2f750 | 1217.2s | 35.73h | 106 | 2024-10-28 |
939c0651 | 1212.1s | 57.17h | 170 | 2024-09-27 |
f600c9af | 1213.0s | 71.53h | 212 | 2024-08-31 |
3c13f94e | 1210.3s | 86.34h | 257 | 2024-07-21 |
955c0228 | 1214.4s | 106.7h | 316 | 2024-06-29 |
Note: The b7e744d3 run continued, simulating 250 seconds of idle, so it's ratio cannot be compared to the rest.
2025-12-31 7% greener R1000
The emulator runs only 2.32% of the hardware speed, but uses only 2.17% as much electricity.
QED: The emulator is 7% more energy efficient than the hardware.
Over the midwinter break, the emulator has been put through a couple of longer tests than the usual "does it boot?" test.
First we let it boot the original disk images from PAM's machine which took 33 hours.
Second we did it again, but this time with writable disk images, and then we shut it down after enabling:
package Operator is: […] procedure Archive_On_Shutdown (On : Boolean := True); function Get_Archive_On_Shutdown return Boolean; -- Archive_On_Shutdown causes the next shutdown to store internal -- state in 'archive' form, allowing upgrades and conversion of -- internal data structures. It typically takes several hours to -- complete a shutdown or restart with archive conversions.
That run lasted 62 hours, and ended with the system shutting itself down as expected.
We have started analyzing those disk-images, to see if we can learn more about the internal state from the 'archive' form of the metadata, and some clues have emerged.
2024-12-01 Going downhill!
If you look at the table above this, you will notice that the software emulation is getting much faster faster.
We are now in "minute for second" territory, where the software emulation takes a minute to simulate a second of machine time, we started out in the "hour for second" wilderness.
Schematics are reduced to essentially a single SystemC component for each board, and we are in the "where is this signal needed and by when in the clock cycle is it needed?" which allows us to optimize out man updates of signals and avoiding relatively complex code execution.
One example is the rotator part of the FIU board, conceptually it is quite simple, but the RUN_UDIAG microcode is written to test how it is implemented, so as long as we want to use RUN_UDIAG, we have to implement it that, even though it is fairly complex to do so in software.
But we only need to execute that rotator code when something else needs the output of the rotator, and optimizing for that just cut 15 minutes of our 21 hour baseline.
That is a respectable 1.2% speedup, and compound interest being what it is, we only need to find another 206 of those, before we will have speed parity with a 44 year old computer.
2024-10-23 Everything is simpler in software
We have reached a bit of a milestone with the emulator:
When we started there were almost 800 components on 46 pages of MEM32 schematics, and that was even after we cheated and left out all but one set of the inverter drivers for the signals to the DRAM banks.
Now we are down to a single component, which does all the actual work plus the diagnostic processor which does nothing but exist.
To add insult to injury, the source-code for that single component is only 532 lines in our Python/C++ chimera language.
The TYP and VAL boards have had almost all functionality merged into a single component as well, 758 and 545 lines of code respectively.
IOC, FIU and SEQ are putting up more of a fight.
One of the runs in progress will probably come in around 36 hours to login.
2024-10-01 faster and faster
The changes we make in the emulator at present are subtractive, two component activations per micro cycle here, another one there and so on.
But because the total number of activations are now down to around 400 per micro cycle, each one saved matters more and more in terms of overall performance.
The state of play right now is:
act/µc | % | board |
---|---|---|
14.000 | 3.47 | emu |
34.944 | 8.67 | typ |
40.956 | 10.16 | val |
52.599 | 13.05 | ioc |
59.520 | 14.77 | mem32 |
84.113 | 20.87 | fiu |
116.871 | 29.00 | seq |
403.000 | 100.00 | Total |
Still ways to go…
2024-08-31 below 72hours
The latest integration run, booted in 71½ hours - less than three days and 33% faster than two months ago.
2024-08-11 Making progress
As can be seen from the table above, we're making some serious progress with the emulator performance now.
The current effort is ditching as much of the diagnostics as possible, retaining only what is necessary to
run FRU_P2UCODE
, FRU_P3UCODE
, RUN_UDIAG
and to boot the environment.
With diagnostics out of the way, a lot of "condensation" becomes possible.
We have condensed each board onto a single schematic sheet, starting around a square meter, but rapidly shrinking.
On the test-bed right now is a branch where the entire VAL datapath has been condensed into a single "chip":
As a result, the VAL board now uses 30% fewer SystemC activations than the TYP board.
The next step is to integrate the RF address generator, which currently calculates the addresses more often than necessary, and multiplexes the C address onto the A+B address busses during H2.
But first, we optimized how the C-address was calculated, and the resulting C++ code looks a lot like it implements the microcode encoding as described on the front page of the VAL schematic, but there are a couple of twists relating to the CSA (Control Stack Accelerator), which were not on the VAL schematics.
We also have similar improvements to MEM32 and SEQ in the pipeline, they will all hit the table above, after they complete their test-runs.
2024-07-13 The Scavenger
The Memory-Monitor, which lives on the FIU board, is the control-logic common to all the memory boards in the system.
A part of the Memory-Monitor is a circuit called "Scavenger" which is described in the 1982 document Functional Specification of the Memory Monitor on pdf page 24.
The 1984 document R1000 Hardware on pdf page 70 says:
The memory monitor also has a scavenger RAM (page 61). This was intended to provide a garbage collection scheme for the memory manager. However, that scheme was abandoned and the scavenger RAM is no longer used.
We have just experimentally found out, that "is no longer used" is not the same as "can be removed".
A emulator run without the scavenger runs fine until the Virtual Memory has been started, and then it keels over with:
23:15:55 !!! EEDB Assert_Failure Subsystem_Map Inconsistency found in Reset_Kernel_Node 23:15:55 *** EEDB.Init Format_Error Exception = <Exception: Unit = 3586580, Ord = 1>, from PC = #163C13, #3AF *** Calling task (16#C985404#) will be stopped in wait service 23:15:55 !!! Internal_diagnostic (trace) ** Start of trace ** 23:15:55 !!! Internal_diagnostic (trace) Task: #C987804 23:15:55 !!! Internal_diagnostic (trace) Frame: 1, Pc = #92013, #64D 23:15:55 !!! Internal_diagnostic (trace) in rendezvous with #C985404 23:15:55 !!! Internal_diagnostic (trace) Task: #C985404 23:15:55 !!! Internal_diagnostic (trace) Frame: 1, Pc = #163C13, #151B 23:15:55 !!! Internal_diagnostic (trace) Frame: 2, Pc = #163C13, #1B52 23:15:55 !!! Internal_diagnostic (trace) Frame: 3, Pc = #163C13, #19B1 23:15:55 !!! Internal_diagnostic (trace) Frame: 4, Pc = #92813, #C8 23:15:55 !!! Internal_diagnostic (trace) Frame: 5, Pc = #92813, #D6 23:15:55 !!! Internal_diagnostic (trace) Frame: 6, Pc = #92413, #E4 23:15:55 !!! Internal_diagnostic (trace) Frame: 7, Pc = #92013, #CC 23:15:55 !!! Internal_diagnostic (trace) ** End of trace **
But other than that, as can be seen in the table above, the emulator is getting faster. We have pulled out all the parity error checks, but not the ECC checks, and we have started to eliminate the diagnostic circuitry, retaining only what is necessary to run FRU::P2UCODE, FRU::P3UCODE, RUN_UDIAG and the Environment.
2024-07-04 Improvement on second IOC board - back to state as of 2023-11-16
We found the problem that gave us a set-back on the second IOC board. While replacing one of the chips, the IO.DCLK line was damaged so it no longer provided a signal to 5 latches. We have soldered an extra wire on to fix this, and the board now boots like it did back in 2023-11-16.
So this IOC now passes self test, again, but fails to communicate properly SCSI over the UniBus interface.
At least the set-back has been reset now...
2024-06-15 End of no-parity experiment
For now we have come to the end of the experiment where we got rid of all parity and ECC checking.
It speeds the emulator up by 20-ish percent, which is nothing to sneeze at, but we have concluded that we are not yet ready to continue without the diagnostic subsystem.
2024-06-13 n_running_r1000++, and perhaps a lead on IOC2 trouble
Finally got time to take a look at the R1000. With one of our previously hacked BIOS the R1000 complained about "G6".
So after a little work with a side-cutter and a soldering iron, a new SRAM-chip on G6 was in place and the IOC was yet again happy about Life, the Universe and Everything :-)
Another try to identify the illusive problem on IOC board 2 might have given something:
The trigger is when BYTE0 is asserted causing a BERR and in turn the M68K to trap. The trace hints that SRAM G47 (DQ1 in bank 0) has become slow. DQ1 goes high just after CLK.2X clocks and that causes the parity on BYTE0 to fail. Several other traces shows the same pattern, DQ1 is much slower than the other lines. Next time we will replace G47, fingers crossed!
2024-06-09 Parity elimination
The R1000 computer conservatively stores a parity-bit with any byte stored in a RAM chip or passed over the backplane or foreplane busses, and as a general rule the parity bits are passed through rather than recalculated. The DRAM storage on the MEM32 boards use a 9-bit ECC code instead.
All in all, that takes a fair bit of logic in the hardware and in software it gets even slightly worse because calculating the parity is a slightly expensive operation.
We do not expect the simulated RAM chips or busses to have bit-errors, so all this is surplus to requirements for the emulator, but removing it to speed things up, comes at the cost of many tests failing, because the validate the parity-circuit.
For the last couple of weeks we have worked to eliminate all the parity-checking in a separate branch, having as goal that run_udiag
and booting the enviroment are the only two things that must work.
We had to modify that goal slightly, because run_udiag
actually tests the ECC circuitry near the end, so instead of ending in success, it now ends with code 29F2 - ECC_EVENT_NOT_TAKEN.
With all that said, the run which just ended now took only 106 hours, 315 times slower than hardware.
The run we started instead seems to be a percent faster or so.
No news on the hardware front, everybody is busy.
2024-05-07 Gentlemen: Start your FPGAs
We have branched Release2 of the github project to celebrate the fact, that we have simulated the HW-true schematics all the way to login.
That took 59 days and 19 hours on the MacBook M2.
The point of this branch is to provide a well documented launch-pad, should anybody want to create a FPGA based Rational R1000.
2024-04-26 n_running_r1000--
When we tried to boot the working R1000 yesterday, we got IOC RAM errors :-(
2024-04-15 Spring has sprung
And that means a boatload of distractions, so activity is a bit low right now.
On the MacBook M2 the hardware schematic simulation is chugging along, 4000 times lower than hardware (1250 Hz instead of 5 MHz), but it has gotten to CMVC.11.8.2D
and in another three weeks it should be at the login prompt.
We are a bit stuck on the IOC#2 board, the best current hypothesis is that the different SRAM speeds causes the byte parity lines to flutter at an inconvenient time, but we do not have enough 25ns chips to verify that theory, and DIP versions of that SRAM are hard to come by.
Given the prospect that more SRAM chips are guaranteed to die on us, and the fact that surface mount versions of CY7C187 are still obtainable, making a small adapter board, for instance with three chips per board, looks more and more attractive.
2024-03-27 Integrated simulated MEM32
We have started to implement the MEM32 board's core functionality as a single integrated SystemC/C++ class.
The first stage is to implement the tag-RAM logic so the contents is maintained identical to the working "discrete" simulation of the MEM32 board.
So far we are at around 350 lines of code, with a lot of duplication in order to have the same precise memory layout as the discrete model, so we can compare them with memcmp(3)
.
The Functional Specification of the Memory Monitor has been of good help in this, and it has mostly been a matter of waiting for the test-runs to crash, and then look at the end of the 100+GB trace-file to find out why.
The second stage will be doing the same for the DRAM arrays, we do not anticipate any challenges here, once the tag-logic is ironed out, this should follow trivially.
The third stage will be to migrate the MEM32's output signals from the discrete model to the integrated model.
We do not have any documentation which explains the meaning and timing of the control signals, so some Prototyping™ (The methodology formerly known as "Trial&Error™") will be involved.
The fourth and final stage will be to eliminate the discrete simulation to reap the "integration-dividend".
This will mostly be about simulating enough of the Diagnostic Archipelago to keep the programs on the IOP happy.
2024-03-19 and more debugging
The following measurements have been done with cut termination resistor pin of A22 (no line terminator).
Last session lagged measurements of some of the SRAM pins. VCC between H2 and H19 are 4.91-4.95 with ripple at average 140mV, but increases up to 450mV every 12.8us (78.1kHz) for about 100ns.
Other observations:
Pin 10 /WE switches at 10MHz continuously (/CE not asserted) - The superimposed signals on A22 is at the same frequency, and it is not unlikely that the lines of A22 and /WE is going close by each other.
Q-output of H40 (DQ8) has a slightly different shape than on any other of the 72 SRAMs. Interesting enough, the noise stick with H40 even when exchanging the SRAM with on from another position.
DQ8
DQ!8
SRAM output while not driven is at 1.8-1.9V.
2024-03-14 and debugging
Assuming the IOC2-problems are rooted in SRAMs causing parity-errors at times where they are not meant to be tested, we took on to systematically measure the IOC2 SRAMs to find out if they are working within specs, starting with the address lines. Address-wise the SRAMs are organized in two banks of 256KB, a low and a high bank, 512KB in total. Physically the SRAMs are grouped in 4 blocks. Each block with low and high bank of the same Byte - 16+2 SRAMs in each block sharing the same address lines (separate Chip selects):
Physical SRAM layout (Bank / Byte) | ||
---|---|---|
row/column | H | G |
41-48 | H/1 | L/0 |
33-40 | L/1 | H/0 |
31-32 | Parity 1 | Parity 0 |
18-19 | Parity 3 | Parity 2 |
10-17 | H/3 | L/2 |
2-9 | L/3 | H/2 |
First set of measurements were done on address lines on the physical ends of each block.
Pin | SRAM address line noise, Pk-Pk (mV) | Line | |||||||
---|---|---|---|---|---|---|---|---|---|
H2 | H19 | G2 | G19 | H31 | H48 | G31 | G48 | ||
1 | 170 | 220 | 180 | 170 | 240 | 310 | 185 | 250 | A14 |
2 | 140 | 210 | 130 | 210 | 175 | 135 | 155 | 205 | A15 |
3 | 165 | 210 | 85 | 135 | 140 | 130 | 170 | 200 | A16 |
4 | 200 | 200 | 170 | 180 | 145 | 150 | 175 | 115 | A17 |
5 | 140 | 165 | 125 | 170 | 175 | 180 | 210 | 130 | A18 |
6 | 175 | 145 | 110 | 200 | 175 | 150 | 165 | 140 | A19 |
7 | 105 | 210 | 145 | 200 | 150 | 115 | 205 | 130 | A20 |
8 | 190 | 135 | 200 | 140 | 165 | 115 | 210 | 190 | A21 |
14 | 775 | 365 | 570 | 275 | 180 | 445 | 195 | 275 | A22 |
15 | 230 | 340 | 130 | 95 | 160 | 185 | 205 | 190 | A23 |
16 | 125 | 150 | 130 | 150 | 170 | 120 | 175 | 165 | A24 |
17 | 150 | 150 | 115 | 135 | 240 | 232 | 230 | 150 | A25 |
18 | 130 | 200 | 90 | 160 | 150 | 145 | 200 | 170 | A26 |
19 | 195 | 180 | 160 | 155 | 165 | 170 | 195 | 205 | A27 |
20 | 135 | 195 | 115 | 135 | 140 | 130 | 245 | 185 | A28 |
21 | 130 | 190 | 110 | 150 | 135 | 130 | 175 | 195 | A29 |
Bank 0 | Bank 1 | Bank 2 | Bank 3 |
(The measurements should probably have included all pins and not just the address lines...)
In an attempt to locate the source of the noise on A14 BYTE3, a second set of measurements were done. The below values are delta 10mV based on a "manual" average of what was measured on the scope:
Noise on pin 14 | ||
---|---|---|
Chip | Pk-Pk (mV) | Is new |
H2 | 770 | * |
H3 | 760 | |
H4 | 730 | |
H5 | 720 | * |
H6 | 700 | * |
H7 | 670 | |
H8 | 660 | * |
H9 | 630 | |
H10 | 610 | * |
H11 | 580 | * |
H12 | 540 | |
H13 | 520 | * |
H14 | 480 | |
H15 | 450 | |
H16 | 420 | * |
H17 | 400 | * |
H18 | 370 | * |
H19 | 350 | * |
Clearly, least noise at the drivers at the center and most noise toward the termination resistors at the edge of the board. Same pattern on the other banks, although not as pronounced (Table 2).
The termination resistors are DIL16 chips R220/330 serving multiple lines. A line next to the A22 is a CLK.2X line working at 10MHz. Zooming in on the A12 noise, it is turns out that the noise is actually at CLK.2X frequency.
This lead to the last test done: cutting the termination resistor pin of A22 to see if the resistor is somehow responsible for the noise - It is not; the chip-side becomes nice and steady, while the board side (A22) noise remains, although at a lower mean level. The line is actively driven low by the driver.
Earlier, while exchanging one of the SRAMs our de-solder machine spewed out some solder, which we hoped to have cleaned up, so a close-in re-inspection was done, without finding any remains.
We tried to remove all the new SRAMs on BYTE3, but the noise remains.
- Solder residue still on the board after the spillage? - Table 2 does not seem to agree (identical patterns on other banks).
- Solder between A22 (pin 14) and a neighbor pin on one of the exchanged SRAMs? - Not likely, pin 13 is data in which is stable and pin 15 is A23 which is much less noisy than A22.
- Defect termination-resistor chip? - The pin-cut-test does not support this.
- Defect driver? - Wouldn't the noise amplitude be higher closer toward the driver?
- New SRAMs radiates the noise? - No, we tried to remove these, same noise.
- Old SRAMs radiate the noise? - Table 3 does not support this.
But then what???
2024-03-09 Still debugging
We're still debugging IOC2 which have done us the favour of failing much more predictably.
Current hypothesis is that when the M68K CPU writes to registers in UNIBUS space, the IOC is designed such that the parity information put on the UNIBUS comes from the RAM on the IOC board, even though that RAM takes no part in the UNIBUS transaction.
During the EEPROM based selftest, this parity information is read back and causes a bus-error signal to get raised, which the EEPROM has not yet prepared for.
One possible explantion for why we see this, is that we have replaced the defective 64Kx1 SRAM chips with "NOS" chips which are slightly slower (35ns), so the parity-error signal may take longer to stabilize.
Work continues on the Emulation, where we are both speeding up the "Optimized" schematics and working to get the HW-true schematics to pass all tests and boot again.
2024-02-19 T-12y and counting
If we can speed up the emulator by one percent every week, it will run as fast as the actual hardware in a little over 12 years.
And since 1%/week does seem to be our rate of improvement right now, it is probably time for a new strategy.
Until now the optimizations have been conditioned on keeping everything working, including all the diagnostic facilities, but this is increasingly becoming a headache for us, because the diagnostic facilities, sensibly, test the implementing circuits rather than the desired architecture, and they often test the implementing circuits in ways the desired architecture does not use them.
To give a concrete example:
The architectural view of a micro instruction on the TYP board may be: "Take whatever is on the TYP bus, and store it at location X in the register file"
The implementing circuitry needs to put the TYP bus on the A or B bus to the ALU, tell the ALU to pass that bus through to it's output where it is put on the ALU bus, from there to the C bus, and then setup the right address for the register file and strobe the Write Enable signal to the actual register file RAM devices.
But in the implementing circuitry, the C bus does not go to the register file RAM devices, only the A and B bus does, so there is an extra step of gating the C bus to both the A and B busses, because both sides of the register file needs to be written.
Because the diagnostic tests test the implementing circuitry, they test that the TYP bus can be placed on the A and B busses, that the C bus can be placed on the A and B busses, and that the ALU can pass a value through unharmed.
If we try optimize things by using true dual-port register file RAM devices, which is what the architecture calls for and get rid of the extra bus-switch-yard, which the reduction of the architecture to implementing circuitry brought into existence, the diagnostics will fail.
So it may be time for us to give up on the low-level diagnostics, and rely only on the micro-code based diagnostics going forward.
We will try out this idea on the MEM32 board, because it is, all things considered, very simple and it has no microcode.
2024-02-04 Lookin' good
The museum was open to the public today and we used the real Facit terminal to log into the emulator:
Seems to work, as expected, but it is slooooow…
2024-01-27 And we have liftoff!
Ladies & Gentlemen!
The R1000/s400 software emulator works:
09:19:31 +++ Operator Enable_Terminal 244 09:19:31 +++ Operator Enable_Terminal 245 09:19:31 +++ Operator Enable_Terminal 246 09:19:31 +++ Operator Enable_Terminal 247 09:19:31 +++ Operator Enable_Terminal 248 09:19:32 +++ Operator Enable_Terminal 249 09:19:32 !!! Machine_Initialization_Start Exception_While_Starting TMS_Elaborate in context !Machine.Initialization.Rational ====>> Elaborator Database <<==== EEDB: ====>> Console Command Interpreter (System Job 223) <<==== username: pam password: session: s_1 99/01/22 09:20:04 --- pam.s_1 logging in. ====>> Ci.Interpret (PAM.S_1 Job 221) <<==== command: what.users ====>> What.Users (PAM.S_1 Job 219) <<==== User Status on January 22, 1999 at 9:20:13 AM User Line Job S Time Job Name & ============== ==== === ==== ========= ===================================& ====== *SYSTEM - 4 RUN 13:54.446 System 5 RUN 13:54.447 Daemons 223 IDLE 29.439 Console Command Interpreter 246 IDLE 53.480 Rational_Access Commands Rev 1_0_2 248 IDLE 57.840 Print Spooler 250 IDLE 1:03.681 Smooth Snapshots 253 IDLE 58.227 Ftp Server PAM - 219 RUN 1.377 WHAT.USERS 221 IDLE 4.966 CI.INTERPRET("!machine.device ..., & ...) NETWORK_PUBLIC - 225 IDLE 37.757 Archive Server ====>> Ci.Interpret (PAM.S_1 Job 221) <<====
The bad news is that it takes an hour just to login…
2024-01-27 … and warmer
NATIVE_DEBUGGER.11.2.0D CROSS_DEVELOPMENT.DELTA INITIALIZE.11.2.4D ====>> Environment Log <<==== 09:17:25 +++ Operator Enable_Terminal 16 09:18:55 !!! !Machine.Initialization.Rational.Dtia Unable_To_Start_Dtia_Rpc_Server ERROR Activity does not define a valid load view for the subsystem of !TOOLS.DTIA_RPC_MECHANISMS. REV11_4_SPEC.UNITS.TARGET_INTERFACE.ELABORATION'SPEC
2024-01-27 Getting warmer
TOOLS_INTEGRATION.DELTA CMVC.11.8.2D DESIGN_FACILITY.DELTA ARCHIVE.11.4.0D NATIVE_DEBUGGER.11.2.0D CROSS_DEVELOPMENT.DELTA INITIALIZE.11.2.4D ====>> Environment Log <<==== 09:17:25 +++ Operator Enable_Terminal 16 ====>> Elaborator Database <<==== EEDB:
2024-01-26 Thinking ahead
While we wait to see where the emulation croaks next time, it is a good time to take a step back and think ahead.
The primary goal of this entire project was to produce a software emulation of the R1000/s400 computer, so that the uniqueness of the computer, and of the Rational Environment will not be lost to humanity, when the last hardware finally releases the magic smoke.
The question I want to disuss here is a bit like »Let's go see Rome in our vacation.«
There may be only one set of longitude and latitude, but there are dozens of Romes at that location, The Pope's Rome, The Cæsar's Rome, The Facists's Rome, The Independent Rome, The culinary Rome, The Opera Rome and so on. Which one, or which ones, of those Romes do you want to experience, and how long is your vacation ?
Likewise there are many R1000's to emulate: There is the R1000 running the Rational Environment, there R1000 implementing the Rational Architecture, there is the R1000 son-of-son of the original four-CPU monster s100, there is the R1000 with the amazing diagnostic abilities, there is the R1000 with type-checking in hardware, there is the R1000 with distributed microcode, there is the R1000 which is bit-oriented, there is the R1000 without linear address-space and so on. How much does the R1000 interest you ?
Because we were forced to implement the R1000 emulator from the hardware schematics, we have ended up with a software emulation which allows one to study all these aspects of the R1000, but at glacial speed: I have not tried running the "Hw" branch schematics for a long time, but last I tried they were several thousand times slower than the hardware.
If you want to understand how the ECC circuitry and hardware works you can probably live with that, if you want to demo the Rational Environment's semantic view of large Ada Projects … not so much.
What we have right now is the "Hw" branch, which supposed to be chip for chip, wire for wire, identical to the two R1000/s400 computers we have in Datamuseum.dk, except for the RESHA board. If you want to study ECC, this is the one you want.
The other branch is the "Optimized" branch, which sports weird and complex chips, some of which contains 1/3 of the FIU board and other which have weird back-door connections to the emulator to load the microcode faster. This branch currently only runs 420 times slower than the hardware, and that is neither fish nor fowl, but it may transpire that it actually works.
If so, it will be perfectly defensible to stop here, we have reached the goal we set ourselves and we can leave the performance issue as an exercise for future generations.
But that would be neither fun nor satisfying, so what do we do instead ?
One possibility would be to validate that Hw branch also works, and then convert that into VHDL or Verilog, synthesize it to a FPGA and create a 1:10 scale-model of the R1000/s400.
For many reasons, mostly lack of skill, I cannot do that. I you want to, I'll do everything I can do to help you, because I'd love to have one on my desk, but I cannot do it.
One possibility is to continue to file away on the "Optimized" branch, shaving a percent performance here, and a percent performance there. I can do that, and if I can shave 1% every week, I will reach speed parity with the hardware … in 12 years. There is a good chance that at some point I will understand the hardware well enough to get down to six compoents: IOC, FIU, SEQ, TYP, VAL & MEM32, but they will no longer teach you anything about the actual hardware of the R1000, in fact the "Optimized" branch already does not do that.
Another option is to use the observability provided by the "Optimized branch" to start implementing a "traditional" software emulation at the macro-instruction level. That will undoubtedly give us the fastest running emulator, but it will be tough going, because we need to understand both the hardware and the microcode to find out what the instructions actually do.
Going down a layer, we can implement a traditional software emulation at the micro-code level. This has the benefit that we do not need to understand the microcode instructions in relation to each other, or indeed anything about the macro instructions or Ada, we just need to execute all the actually used micro instruction words correctly. I expect such a simulator to run at least several times faster than the hardware, even on a tiny computer like a RaspBerry Pi, and to ensure the correctness of that emulation, it can be compared directly bit for bit with the Optimized branch or even the Hw branch.
In terms of preservation, in the museum sense of the word, there is no doubt that the Hw branch is the real deal.
In terms of presentation of the Rational Environment, the Hw branch is useless, and an FPGA solution will have to be "refreshed" every couple of decades, because sooner or later the magic smoke escapes.
A microcode emulation would allow presentation of the Rational Environment with high fidelity, and if written in a stable programming language, it can last forever.
So I should probably start to think about that…
/phk
2024-01-25 Finally getting somewhere
2024-01-23 Two down, two to go
We are running to test-runs in parallel, and they both passed PRETTY_PRINTER
just now:
We have seen that once before, but this time we think we know why they made it.
The real test is in 24 hours, when they reach OS_COMMANDS
, which we have never managed to get past.
2024-01-22 Well, that would do it...
The python script delivered its verdict overnight, and … ehh … duh ?
scsi_d 1 WRITE_6 0a000ec003000000 scsi_d 1 READ_6 08000ec001000000 scsi_d 1 READ_6 08000ec101000000
The hex-strings are SCSI CDBs and the '03' in the WRITE_6 command means that three sectors are transferred.
That was news to us, until now he had not seen, or at least, not noticed, the R1000 CPU using anything but single sector transfers.
But we do support multi-sector SCSI transfers, DFS uses them when it push/pull's programs for instance, so that in itself should not be a problem.
However the trace reveals something we have not seen before:
mailbox f 4080000f 81020033 0f000103 00050852 mailbox d 4080000d 81020134 0f000103 00050852 mailbox c 4080000c 81020235 0f800103 00050852 scsi_d 1 WRITE_6 0a000ec003000000
The three mailboxes are out of order and non-contiguous: f-d-c.
Each mailbox has an attached 1Kbyte buffer, so if the mailboxes had been in c-d-e order, it would have worked fine.
In other words: Scatter/gather disk-I/O.
That, finally, explains what the "IO BUS MAP" on page 23 and 24 of the IOC schematic is good for.
Fortunately the fix looks trivial:
- dma_read(sd->ctl->dma_seg, sd->ctl->dma_adr, ptr, len); + for (xlen = 0; xlen < len; xlen += (1<<10)) { + dma_read(sd->ctl->dma_seg, sd->ctl->dma_adr + xlen, pp + xlen, 1<<10); + }
and similar for write.
This is going to be an interesting and slightly tense week…
2024-01-21 Aha!
Armed with two binary traces, one from a run which stopped at PRETTY_PRINTER (6GB) and one which managed to pass that (11GB), we have now found out that the failing run reads sector 0xec1 from drive 1 and gets all zeros back, whereas the good run reads the same sector and gets non-zero data back.
This would not be inconsistent with Exception 16#20 being "Numeric Error (zero divide)" according to pdf page 69 in Guru Course day 1.
The non-zero data in the good case is not identical to the content of the disk-image when we start the emulator, so it must be data written during the run.
A python script is now trudging through the smaller trace to find such that disk write operation.
2024-01-11 IOC2 goes bad(er)
Yesterday we went through our TODO list and were thrown off at several points along the way.
For instance, it transpired that the disk images we preserved from the disks in PAM's machine did not include the three "diagnostic cylinders at the end", they had only 1655 of the disk's 1658 cylinders.
Until we ran the DISKX.M200 program this did not matter, and we never noticed, and because the emulator correctly emulated the disks as having 1658 cylinders, it had not problem with DISKX.
But we configured the SCSI2SD disk emulator from the size of the images, so it presented only 1655 cylinders in the "mode pages" and when we tried to run DISKX on the hardware, the kernel panic'ed when it saw I/O requests to cylinders past the end of the disk, as it should have.
The visual inspection found no relevant differences between the two board, and the handwritten correction to IOFF0A on the schematic must have predated the s400, because it was incorporated in the PCB layout.
Next we soldered connections to some of the interesting chips to the solder-side of the IOC, removed all the other board from the machine, to make space to get to the connections, and tried to make sense of things.
But by now the board did not even complete the IOC EEPROM self-test, it failed with the RED led on the front-panel indicating that the M68K had halted itself. Eventually we could show that the two of the byte parity signals from the IOC RAM were noisy, even in this state where nothing were accessing them, and we decided to call it a night. If we can reproduce that noise with the board on the workbench, it should be a quite easy to track down where the noise comes from. It could be the 74F280 parity generators, but it is far more likely to be another couple of 64Kx1 SRAM chips heading for the eWaste bin.
The emulator now reaches PRETTY_PRINTER in 38h43m.
2024-01-09 IOC2 in the memory wringer
Happy New Year!
This last sunday the museum was open, and we used the chance to give the SRAM on IOC2 a really good workout with the M68K CPU.
Using the excellent "vasm" by Volker Barthelmann et al, we have created an adhoc tool-chain which allows us to write test-code in M68K assembler and get commands out, ready to be pasted into the IOC's builtin low-level debugger.
The low-level debugger's commands are described on the first page of The IOC Schematics and not too bad to use manually, once you memorize them, and the addresses you are working with, but for entering even trivial short programs, cut and paste directly from the tool-chain is much more convenient.
But efter beating the RAM up for several ours in various creative ways, we found no reason to think the M68K is able to trigger the parity-related or -adjecent situation which causes huge disk-writes to stall.
We also tried another interesting experiment, we overwrote PROGRAM_2.M200 with DISKX.M200, this is pretty trivial to do since the DFS filesystem lays files out in contiguous sectors, and the RPi we use as console server has a "backdoor" USB connection to the SCSI2SD disk emulator board. With sector numbers where the file start from the emulators "dfs" CLI command, good old dd(1) will do the job.
DISKX.M200 is a "DFS based disk exerciser", described in not too much detail on page 27 of the Command Interfaces manual.
By putting DISKX in PROGRAM_2's place, we can boot the system and via the EEPROM prompts load DISKX without ever doing a DFS "PUSH" and thus without doing any of the large disk writes which hangs.
This overwrite trick is very handy, and we checked that it also works for CLI.M200, and for that matter, we can patch KERNEL.M200, FS.M200 or even write our own programs to run.
But back to DISKX: When run this way in the emulator, it works fine, run on IOC2 it fails hard and fast:
Initializing M400S I/O Processor Kernel 4_2_18 Disk 0 is ONLINE and WRITE ENABLED Disk 1 is ONLINE and WRITE ENABLED IOP Kernel is initialized Exercize unit 0 [Y] ? Exercize unit 1 [Y] ? Disk unit => 0, using cylinders [1648..1654] Disk unit => 1, using cylinders [1648..1654] DFS based I/O Processor Kernel Crash: error 066D (hex) at PC=00004B82 Trapped into debugger. RD0 00000676 RD1 00000000 RD2 00000001 RD3 00000001 RD4 00000444 RD5 00000000 RD6 80001900 RD7 00000001 RA0 0000E820 RA1 0003FF80 RA2 0000E83C RA3 00020578 RA4 00026350 RA5 0003FF4E RA6 0003FF8A ISP 0000FAB0 PC 0000A158 USP 0003FF4E ISP 0000FAB0 MSP FDFF742D SR 2704 VBR 00000000 ICCR 00000001 ICAR DB359DB2 XSFC 7 XDFC 1
Crash error 066D is listed on pdf page 4 of IOC Schematics as "unimplemented" but from our disassembly of KERNEL_0.M200, it looks like a catch-all "Somethings horribly wrong with disk-I/O".
Unfortunately JMP instructions are used to get to the error-emitting code, so no trace is left on the stack of which particular of the about a dozen different conditions caused it.
Patching those JMP's to JSR in KERNEL_0.M200 and trying again is now on our TODO list.
The fact that DISKX.M200 fails almost instantly, despite doing only single-sector transfers, hint that the problem, whatever it is, has to do with the intensity of DMA transfers more than the actual length of the individual transfers, which might indicate some kind of thermal issue.
Based on nothing but Mike Druke remembering that Signetics 74F373 caused some grief back in the days, we have previously replaced one such which is very prominent in the UNIBUS::PB path (IOREG12 @ K20), but that did not magically fix the problem.
The latch signal for that 74F373 is generated in the top right corner of IOC schematic page 25 (= pdf page 35), by IOFF0A, a 74F74, and the schematics have a handwritten correction which inverts that signal.
There is quite a gap in ECO levels between the working and non-working IOC boards, so we are also going to do a very detailed visual audit, to see if we can spot any differences, and in particular look for this one, since it is goes to the heart of the trouble we see.
2023…2012
- 2023 Logbook entries - Working to get the second R1000 running and still working on emulation
- 2022 Logbook entries - We are getting somewhere with the emulation
- 2021 Logbook entries - We discover there is no instruction set, and start simulating hardware
- 2020 Logbook entries - Covid-19 happens, and we start to create a software emulation
- 2019 Logbook entries - We fix the IOC RAM error, and Pierre-Alain Muller drops by for startup
- 2014-2018 - The project got stuck for lack of a sufficiently beefy 5V power-supply, and then phk disappeared while he built a new house.
- 2013 Logbook entries - Documentation preservation and we run into the IOC RAM error
- 2012 Logbook entries - Preservation starts and we borrow tapes from Pierre-Alain Muller
Many thanks to
- Erlo Haugen
- Grady Booch
- Grek Bek
- Pierre-Alain Muller
- Michael Druke
- Pascal Leroy