Rational/R1000s400/Logbook

Fra DDHFwiki
Spring til navigation Spring til søgning


2022-11-27 Options

We are of course disappointed that the emulator died after three months, but to be honest, we never expected it to get that far in the first place.

Now that it has stopped, we are thinking hard about what we do next, because a 3 month turn-around does not indicate trial&error as a viable strategy: Actuarial tables gives us, at best, 50 sequential experiments.

We have come up with this list of calamities we cannot rule out, in no specific order:

A) An inaccuracy in the redrawn schematics.

B) A deficiency in one of the SystemC component classes

C) A race condition due to imprecise emulation of component timing

D) Wrong responses to I/O requests sent to the IOP

E) Wrong Cluster Number or other EEPROM or NOVRAM content

F) The schematics do not match the hardware 100%

G) MacBook execution error, Cosmic rays, etc.

Their probabilities are different, but falsifying any one of them will take a lot of time and effort.

The emitted diagnostic is undoubtedly an important clue, but in a language it will take considerable time for us to master.

And while we ponder these options, the MacBook could be chucking away to get us more data, but what should we run on it ?

If we restart the exact same run, in three+ months we will learn if:

A) The problem is exactly reproducible, ie: it crashes, microcode trace RAMs are identical.

B) Approximately reproducible, ie: it crashes approximately the same, but microcode trace RAM tells different story.

C) Not reproducible, ie: The emulation fails in a different way.

D) Transient, ie: The emulation continues to login in 5-6 months.

If we start one of the optimized branches, we only learn something about this failure, if the optimization introduced no new problems, about which we have almost zero confidence.

On the other hand, if new problems have been introduced, we will know much sooner than three months, and if not, we will learn about the current problem in approx 6 weeks instead of 3 months.

We have also not explored what the performance impact would be, if we ran two instances of the emulator in parallel.

So why not do both ?

We will - once this entry has been saved in the logbook.

So what are our options ?

One obvious and compelling option is to switch to a hardware based approach.

Find a suitable FPGA evaluation board, convert our netlist to VHDL, come up with component models in VHDL, and run tests in a matter of days and hours instead of months.

It is of course nowehere near as simple as that, but if we jumped on it, we might have the first test-result before the MacBook delivers the next one.

It is however not cheap.

We need are pretty good FPGA to fit all of the R1000, the M68K20 and five 8052 CPUs, and we need 32+MB of RAM for the MEM32 board, external or internal to the FPGA.

That is of course assuming we can find a M68k20 model to use, and that we reverse engineer the RESHA board, because we only have preliminary schematics for that.

Partitioning the IOC board so the IOP can run on an external support processor is probably both faster and more feasible.

Also: None of us are VHDL sharks.

On the plus side: The fantastic diagnostic subsystem of the R1000 will help a lot.

Another option is to speed up how fast we can run tests on the software emulation.

We have looked at getting more CPU cores engaged, ideally one per board.

SystemC has threads, but they are notoriously slow because of the cross-thread locking the require, and in particular, they all seem to use the same central event-scheduling data-structures, so there is no real prospect of any gain by running a thread per board.

A more promising avenue would be to run each board in a separate UNIX process, and implement the front- & backplane in shared memory, using atomic instructions to (spin-)lock the boards each simulated 200ns clock period.

Such an implementation will run at the speed of the slowest board, and we currently see boards run 100-200 times slower, when simulated individually.

If we assume such a model end up running only 100 times slower than hardware, that means the five processes must synchronize every (200ns * 100 = 20µs) which is not unreasonable.

This is clearly worth looking into.

Finally, it is time we complete the support for snapshots, so that we can restart a three month run from T - N hours, instead of not getting $200 when we cross "Start". This has been in the design from the very start, until now we have just not needed it enough.

So much work to do, so little time...

2022-11-24 All good things

After 84 days & 15½ hours, and 1957 seconds of simulated time:

  ====>> Kernel.11.5.8 <<====
  Kernel:
  
  Kernel assert failure detected by kernel debugger stub.  System shutting down.
  Exception: !Lrm.System.Assertion_Error, from PC = #F6413, #707
  
  ***************************************
  Sequencer has detected a machine check.

The overall simulation ratio was 1/3736, not quite a second per hour, but close.

2022-11-24 Getting further

  IMAGE.11.4.2D
  CORE_EDITOR.11.6.2D
  TOOLS.11.5.1D
  OE_MECHANISMS.11.1.2D
  OBJECT_EDITOR.11.6.1D
  MAIL.DELTA
  OS_COMMANDS.11.6.2D
  
  ====>> Kernel.11.5.8 <<====
  Kernel:

2022-11-15 It keeps dripping

  ====>> Elaborator Database <<====
  COMPILER_UTILITIES.11.51.0D
  SEMANTICS.11.50.3D
  R1000_DEPENDENT.11.51.0D
  R1000_CHECKING.11.51.0D
  R1000_CODE_GEN.11.51.0D

  ====>> Kernel.11.5.8 <<====
  Kernel:

2022-11-13 Houston, this is not a problem

This morning on the simulated console:

  DISK_CLEANER.11.1.3D
  PARSER.11.50.1D
  PRETTY_PRINTER.11.50.3D
  DIRECTORY.11.4.6D
  INPUT_OUTPUT.11.7.0D
                           
  ====>> Environment Log <<====
  00:01:06 !!! Product_Authorization Invalid for Work_Orders
  00:01:06 !!! Product_Authorization Invalid for Cmvc
  00:01:06 !!! Product_Authorization Invalid for Insight
  00:01:07 !!! Product_Authorization Invalid for Rpc
  00:01:07 !!! Product_Authorization Invalid for Tcp/Ip
  00:01:07 !!! Product_Authorization Invalid for Rci
  00:01:07 !!! Product_Authorization Invalid for X Interface
  00:01:08 !!! Product_Authorization Invalid for Rs6000_Aix_Ibm
  00:01:08 !!! Product_Authorization Invalid for Cmvc.Source_Control
  00:01:08 !!! Product_Authorization Invalid for Rcf
  00:01:09 !!! Product_Authorization Invalid for Testmate
  00:01:09 !!! Product_Authorization Invalid for Lrm_Interface
  00:01:09 !!! Product_Authorization Invalid for Fundamental Session
  00:01:10 !!! Product_Authorization Invalid for Telnet
  00:01:10 !!! Product_Authorization Invalid for Dtia
  00:01:10 !!! Product_Authorization Invalid for X_Library
  00:01:10 !!! Product_Authorization Invalid for Ftp
  
  ====>> Kernel.11.5.8 <<====
  Kernel: 

That is probably because the disk-image is from PAM's machine, (cluster number 408459), whereas the IOC-EEPROM image, where the cluster number is stored, is from Terma's machine (cluster number 453305).

If we are luck we get a login-prompt sooner this way, if lack of authorization eliminates these layeres products from the workload.

2022-11-10 Loss of grid power

We lost grid power for 30 minutes today:

20221110 power cut.png

But the MacBook M2 coasted right through, and have added two more packages in the last three days:

  BASIC_MANAGERS.11.3.0D
  ADA_MANAGEMENT.11.50.4D

2022-11-07 Action!

Things are happening now:

  the virtual memory system is up

  ====>> Kernel.11.5.8 <<====
  Kernel: START_NETWORK_IO
  Kernel: START_ENVIRONMENT
  TRACE LEVEL: INFORMATIVE

  ====>> Environment Elaborator <<====
  Elaborating subsystem:  ENVIRONMENT_DEBUGGER
  Elaborating subsystem:  ABSTRACT_TYPES
  Elaborating subsystem:  MISCELLANEOUS
  Elaborating subsystem:  OS_UTILITIES
  Elaborating subsystem:  ELABORATOR_DATABASE

  ====>> Elaborator Database <<====
  NETWORK.11.1.5D
  OM_MECHANISMS.11.2.0D

This is from the MacBook M2, after emulating approx 1560 seconds of CPU time.

We have scanned all the documents we received from Grady Booch, and archived them in the Datamuseum.dk Bit Archive and are reading our way through them.

There has also been progress on the optimized branches of the emulator, but not enough to warrant a detailed update yet.

2022-10-22 Potato-Vacation

Week 42 used to be "potato-vacation" in Denmark, where kids were out of school to help get the potatoes harvested. These days we call it "autumn-vacation" where people close down their "summer-houses" or just stay inside and read.

We did a first quick read of the documents Grady sent us, and it is obvious that this is really going to further our understanding of the R1000 machine and software.

The HW-identical simulation is at 1223 seconds in 52 days and simulating.

We have also implemented "turbo download" which speeds up loading the microcode:

20221022 r1000 pref.svg

The purple and corresponding green-ish lines are the MacBook Pro (M2) running the HW-identical emulation.

The big drop at 3.8 seconds is when the microcode has been loaded and started.

The small peaks at 4.75 seconds is when the microcode has initialized and waits for the download of the segments specified in the configuration. The peak at 5 seconds is the pause to show the copyright message on the operator console.

The cyan and corresponding orange lines is the "megacomp4" branch, running on a Lenovo T41s. Only the first minor peak is visible. Notice that performance is approximately the same as the rr000 run, but on a machine with only half the performance.

The yellow and corresponding blue-ish lines is with the "turbo download" enabled, that saves a second of simulated time and about 15 minutes of real time.

The red and corresponding black lines are with both "turbo download" and "direct download". Now the microcode download is almost instant, and we are executing microcode in a matter of (real) minutes.

So what is "turbo download" ?

On the real machine, the IOP pours the microcode onto the DIAGBUS, interleaving "experiments" for multiple cards in order to speed up things.

But because SystemC is single-threaded, that parallelism does not happen since only one DIPROC execute at a time.

"Turbo download" cheats. Instead of passing the received DIAGBUS data to the DIPROC and it's interrupt routine, the thread which subscribed to the "elastic" buffer, writes the experiment directly into the DIPROC RAM. Since each DIPROC has it's own elastic-thread, that runs in parallel, which shaves a second of emulated time.

"Direct download" takes this a step further, it recognizes specific experiments, for instance LOAD_CONTROL_STORE_200.FIU, picks the microcode bits out, and sticks them directly into the SystemC components shared memory context, without involving the DIPROC and the thread executing the SystemC emulation at all.

Not only is that parallel, it is also much faster, as can be seen from the diagonal lines: It that shaves one (real) hour of our test-runs.

2022-10-17 Packet from Grady

1101 seconds emulated in 46 days, still nothing new on the console.

This morning a packet arrived with a donation from Grady Booch:

20221017 pakke1.png

Amongst the gems were:

20221017 pakke2.png

All these documents will be scanned and made available in our BitArchive as soon as possible.

We have also experimented with a new branch where we start to remove "excess" functionality in order to gain speed and flexibility.

First to go where the parity-checks and ECC checks. Unless we explicitly want to trigger them, they will never happen, so there is no need for them in the "show-the-Environment" version of the simulator.

Next step is to also remove the diagnostic archipelago, both as a matter of speed, but also because it makes very concrete demands on the circuitry in order to function, demands which for instance prevents us from using dual-port component for the TYP+VAL register-files.

The first thing is to stop using the DIPROCs, so far we have implemented downloading of microcode, register-files and dispatch ram, by taking the data from the DIAGBUS and stuffing it directly into the shared memory context of the applicable components. Works nicely.

This then frees us from the exact layout of the diagnostic registers, for instance to rearrange bits in the microcode words to make for more and wider SystemC busses etc.

2022-10-09 A quarter of the way ?

921 seconds emulated in 936 hours, so we are about ¼ of the way to login.

The optimized schematics run at about 2100 seconds per second, 43% faster than the hw-identical simulation, and our current test-run has started initializing the virtual memory, so it looks correct-ish.

If we let it run, it will overtake the hw-identical simuation in a couple of months, call it mid-december, and get to the login-prompt three weeks sooner, early january rather than late january.

Once side effect of both of these runs, is that they give us a trace of which disk-sectors the kernel reads and writes during VM initialization, this will be valuable information when we resume trying to figure out the on-disk layout.

2022-10-01 Now talking to the emulator

733 seconds emulated in 740 hours, otherwise known as "a month".

A three day run with the optimized schematics, booted into "KERNEL mode" allowed us to issue kernel commands via the serial port for the first time:

  CLI/CRASH MENU - options are:
    1 => enter CLI
    2 => make a CRASHDUMP tape
    3 => display CRASH INFO
    4 => Boot DDC configuration
    5 => Boot EEDB configuration
    6 => Boot STANDARD configuration
  Enter option [enter CLI] : 4
  […]
  Starting R1000 Environment - it now owns this console.
  
  ====>> Kernel.11.5.8 <<====
  ERROR_LOG <<====
  00:51:13 --- TCP_IP_Driver.Worker.Finalized
  
  ====>> Kernel.11.5.8 <<====
  Kernel: SHOW_CONFIGURATION_BITS
    IOP 0 POWER ON
    CPU 0 POWER ON
    OPERATOR MODE => AUTOMATIC
    KERNEL DEBUGGER AUTO BOOT => TRUE
    KERNEL AUTO BOOT          => FALSE
    EEDB AUTO BOOT            => FALSE
    KERNEL DEBUGGER WAIT ON CRASH    => FALSE
    KERNEL DEBUGGER DIALOUT ON CRASH => FALSE
    DIAGNOSTIC MODEM CAN DIALOUT => FALSE
    DIAGNOSTIC MODEM CAN ANSWER  => TRUE
    Processor revision => 2.0
    IOP revision => 4.2.18
  Kernel: SHOW_DISK_SUMMARY
  DISK STATUS SUMMARY
  
              Q   IOP    Total      Total    Seek  Soft  Hard    Un   Total
   Vol  Unt  Len  Len    Reads      Writes   Errs   Ecc   Ecc  Recov   Errs
  --------------------------------------------------------------------------
    --    0   --   --         --         --    --    --    --     --     --
    --    1   --   --         --         --    --    --    --     --     --
  
  no disk IO in progress
  
  Debugging information:
  Ready_Volume mask => 0
  Busy_Event_Page => ( 1023, DATA, 259, 193)
  Volume_Offline_Event_Page => ( 1023, DATA, 259, 194)
  Kernel:

2022-09-25 10 minutes in 25 days

We just crossed 600 seconds of emulated time, after running for 25 days, so the "end of January" prognosis still holds.

We have implemented enough of the Xecom XE1201 integrated modem that the

  {diagnostic modem: received DISCONNECT event}

Message no longer appears.

The optimized "megacomp4" version of the schematics are nearly 60% faster than the hardware-identical "main" branch.

2022-09-18 The emulation goes ever ever on…

424 seconds emulated in 419 hours, nothing new output on the console, as expected.

2022-09-13 Good R1000 news

The R1000 was still in a bad mood this afternoon:

R1000-400 IOC SELFTEST 1.3.2 
   512 KB memory ... * * * * * * * FAILED

This provided an opportunity to try our new boot image, which provided us with the following hint:

Defect chips detected: H34

20220915 181319.jpg

So not the slightly suspicious H40, but the oscilloscope verified the too well-known latch-up on H34, so little doubt remained on what was needed...

Since this is probably not the last time we have to do this, the procedure we have followed is as follows:

Cut the chip away from its legs (much easier to de-solder the legs one at a time).

20220915 181515.jpg


Cleaning the pads were not quite as easy, but succeeded:

20220915 190626.jpg


Then solder in the replacement:

20220915 191306.jpg


With a new H34 in place, the R1000 became happy again:

 R1000-400 IOC SELFTEST 1.3.2 
    512 KB memory ... [OK]
    Memory parity ... [OK]
    I/O bus control ... [OK]
    I/O bus map ... [OK]
    I/O bus map parity ... [OK]
    I/O bus transactions ... [OK]
    PIT ... [OK]
    Modem DUART channel ... [OK]
    Diagnostic DUART channel ... [OK]
    Clock / Calendar ... Warning: Calendar crystal out of spec! ... [OK]
Checking for RESHA board                                                                                                       
    RESHA EEProm Interface ... [OK]                                                                                            
Downloading RESHA EEProm 0 - TEST
Downloading RESHA EEProm 1 - LANCE 
Downloading RESHA EEProm 2 - DISK  
Downloading RESHA EEProm 3 - TAPE  
    DIAGNOSTIC MODEM ... DISABLED
    RESHA VME sub-tests ... [OK]
    LANCE chip Selftest ... [OK]
    RESHA DISK SCSI sub-tests ... [OK]
    RESHA TAPE SCSI sub-tests ... [OK]
    Local interrupts ... [OK]
    Illegal reference protection ... [OK]
    I/O bus parity ... [OK]
    I/O bus spurious interrupts ... [OK]
    Temperature sensors ... [OK]
    IOC diagnostic processor ... [OK]
    Power margining ... [OK]
    Clock margining ... [OK]
Selftest passed

Restarting R1000-400S September 15th, 1922 at 17:25:23

OPERATOR MODE MENU - options are:
    1 => Change BOOT/CRASH/MAINTENANCE options
    2 => Change IOP CONFIGURATION
    3 => Enable manual crash debugging (EXPERTS ONLY)
    4 => Boot IOP, prompting for tape or disk
    5 => Boot SYSTEM

Enter option [Boot SYSTEM] : 

The Rational R1000-400 is fully functional again!

2022-09-13 Thank you for holding…

The MacBook Pro chews on, 13 days so far and 329 seconds of emulated time, a little better than "an hour per second".

So far about 11300 disk-accesses have taken place as part of the virtual memory startup.

Here is a plot of the simulation rate and accumulated time:

R1000 20220913 long.svg

The jump at 110 seconds is where the virtual memory startup commences.

If we zoom in on the left side:

R1000 20220913 short.svg

…we see the plot also contains a much shorter run using optimized schematics (and on the MacMini), which is about 33% faster.

We are almost through the list of trivial optimizations, and it looks like the first hard one should be the TYP and VAL register files, which are quite slow to emulate, because they have so many and so fast control signals.

Making the central RAM component of the register files dual-port will probably help, but then we may start to fail diagnostic experiments which rely on the specific topology of the busses.

There are also some medium-hard optimizations, for instance building 64 and 72 bit shift registers for MDR, WDR &c, and modify the permuation tables in the DIPROC to make them more SystemC-busable.


2022-09-08 A really obscure bug

(Emulation still running, 186 hours in real time, 207 seconds emulated time, 5400 disk accesses)

We noticed a file called ENP100.M200 in the DFS filesystem, it contains a low-level debug/exerciser program for the ENP100 network processor, so we launched a IOC-CLI-only emulator and tried to run it, in order to get the IOP memory mapping correct, even though we have no plans to implement the ENP100 in the emulator.

When we tried the DOWNLOAD command, which downloads the ENP100 firmware, the IOC bailed out with a PASCAL error:

   ENP100> download
   
   PASCAL error #1  at location 00010808
   Last  called  from  location 00029480
   Last  called  from  location 00029702
   Last  called  from  location 0002F632
   Last  called  from  location 0003336C
   Last  called  from  location 00033454
   Last  called  from  location 000338CC
   
   Abort : Run time error
   PASCAL error #1
   From ENP100

One part of the download is the IP-number to use, which is read from the DFS file TCP_IP_HOST_ID which contains 192.5.60.20.

To convert the IP-number to a 32 bit integer, each quarter is converted to binary, and multiplied by a scaling factor, and this is where it goes wrong: 192 * 0x01000000 is 0xc0000000, but the multiplication routine used (at 0x1028c) sees that as a negative result and bails out.

Using a class-A IP-number, one that is less than 128.0.0.0, works fine.

Neither https://www.rfc-editor.org/rfc/rfc1117 nor https://www.rfc-editor.org/rfc/rfc1166 lists any IP numbers (obviously) assigned to Rational, so we suspect they used an unofficial class-A network internally, probably 89/8 like everybody else.

2022-09-04 good and bad R1000 news

The good news is that emulator still runs and have progressed to:

  […]
  Starting R1000 Environment - it now owns this console.
  {diagnostic modem: received DISCONNECT event}
  
  
  ====>> Kernel.11.5.8 <<====
  Kernel: CHANGE_GHOST_LOGGING
  WANT TRACING: FALSE
  WANT LOGGING: FALSE
  Kernel: START_VIRTUAL_MEMORY
  ALLOW PAGE FAULTS: YES
  
  ====>> ERROR_LOG <<====
  22:32:20 --- TCP_IP_Driver.Worker.Finalized
  
  ====>> CONFIGURATOR <<====
  starting diagnosis of configuration
  starting virtual memory system

And if it keeps running we can start to log in … ehh … sometime early next year.

Time to start optimizations in earnest now.

The bad news is that when we tried to start the real R1000 today, the IOC reported RAM errors.

When we repaired this IOC three years ago we thought the RAM chip in position H40 behaved slightly suspect, but we did not replace it. Maybe it has finally failed now ?

2022-08-31 Same place, a week later

We have started a run on the MacBook Pro, with the intent to let it run until something happens.

Something can either be a microcode halt, or boot progressing from:

   Starting R1000 Environment - it now owns this console.

To the kernel signing in two or three minutes later:

   ====>> Kernel.11.5.8 <<====
   Kernel: CHANGE_GHOST_LOGGING
   […]

That will take better part of a week, since we are simulating the HW-identical "main" schematics.

2022-08-24 The debugging never ends

We're getting further and further:

  Loading : KAB.11.0.1.MLOAD
  Loading : KMI.11.0.0.MLOAD
  Loading : KKDIO.11.0.3.MLOAD
  Loading : KKD.11.0.0S.MLOAD
  Loading : KK.11.5.9K.MLOAD
  Loading : EEDB.11.2.0D.MLOAD
  Loading : UOSU.11.3.0D.MLOAD
  Loading : UED.10.0.0R.MLOAD
  Loading : UM.11.1.5D.MLOAD
  Loading : UAT.11.2.2D.MLOAD
  851/1529 wired/total pages loaded.
      The use of this system is subject to the software license terms and
      conditions agreed upon between Rational and the Customer.
  
                           Copyright 1992 by Rational.
  
                            RESTRICTED RIGHTS LEGEND
  
      Use, duplication, or disclosure by the Government is subject to
      restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in
      Technical Data and Computer Software clause at DoD FAR Supplement
      252.227-7013.
  
  
                  Rational
                  3320 Scott Blvd.
                  Santa Clara, California 95054-3197
  
  
  Starting R1000 Environment - it now owns this console.
  
  ***************************************
  Sequencer has detected a machine check.
  
  ************************************************
  Booting R1000 IOP after R1000 Halt or Machine Check detected
    Boot Reason code = 0C, from PC 0001ADA2
  
  Restarting R1000-400S August 24th, 1921 at 02:25:27

It looks like the µcode stops at location 0x204, which according to page 150 in the Knowledge Transfer Manual is 0204_HAVE_MULTI_BIT_MEMORY_ERROR.

Preliminary debugging indicates that is indeed happening, next will be to find out why. Test runs around 70 minutes to fail.

2022-08-23 Waiting for the cows to come home

Things take time, so until we have all the requisite test-runs in, we cannot be certain, but it looks like we pass all the self-tests, after implementing the missing bits of the R1000/IOP memory access in the "megacomp3" branch.

We have also tried to boot the Environment, and it looks like the first and only sector of KAB.11.0.1.MLOAD correctly gets read into IOP RAM at 0x40000, the R1000 is notified through the "response FIFO", DMA's the sector into R1000-land and signals completion through the "request FIFO", which interrupts the IOP, which service the interrupt.

But the interrupt remains raised, so it immediately re-services the interrupt, and after doing that 255 times, it reads a 0xffff value which trips a range-check.

Looking, it is not obvious where the irq_lower() call should go, but it is clearly not there now.

2022-08-21 Kitting up for the next phase

Since we are nowhere near pari speedwise, and since we get further and further into the selftest, some effort have gone into finding the best platform to run tests on.

Ideally it should be have a fast CPU and given the potential runtimes of months, backup power would be nice, even though Denmark has one of the most stable power-grids.

Having tried running the R1000 emulation on various machines we have access to, it appeared that Apple's M1 beat every other computer, hands down. As in: it runs the emulation almost twice as fast as a new 'Lenovo T14s Gen 2" laptop.

We have therefore procured a MacBook Pro, which gives us a fast CPU with built in UPS. It has the newer M2 chip, but we see no statistically significant speed difference from the M1 chip.

Simulating the 24 seconds necessary to run the uDIAG test, takes 32000 seconds, almost 9 hours, at an average emulation/hw ratio of 1300.

However this number is slightly misleading, as the microcode loading emulates much faster, yet still takes up ¼ of the emulated time, looking only at the microcode execution time, 16 seconds emulated in 29000 seconds, the performance drops to a ratio of 1800.

Here is a plot as a function of the emulated time:

R1000 20220821 perf.svg

The first plateau at 1.4% of hardware speed is the microcode loading.

The repetitive square-wave pattern are the tests of each of the four cache-sets (A/B times Early/Late), and as can be seen, that takes up the majority of the test run.

The FRU tests currently fails with:

  *** ERROR reported by P3URF:
  The IOC board got a memory parity error while the microcode was
  restoring the register file contents from IOC memory.
  Field replaceable units :
       IOC Board
  *** P3UCODE failed ***
 

That is consistent with current thinking that the R1000 interface to the IOC RAM is our next hurdle.

Currently we use a "megacomponent" which implements all 512Kx26 ram as a single SystemC class, and it uses the IOP-emulation's RAM as backing store, and since the IOP-emulation does not maintain parity bits, the error makes sense.

We have yet to try to move the IOP RAM entirely into the SystemC space, by having the IOP perform memory cycles through the 68K20 SystemC "shell-component", but that is probably the end result, as that seems the only realistic way to keep the hardware-identical "main" branch of the schematics working - with its 72 chip IOP SRAM bank.

If we are lucky, we can identify limited memory ranges which the R1000 CPU accesses, and "cache" the rest, which likely includes all the actual 68K instructions, outside the SystemC model.

But first, we need to make that P3UCODE subtest fail faster.

2022-08-16 It seems to be working

We have now replicated the successful uDIAG run three times, so it is clearly not a fluke.

The connection from the R1000 CPU to the IOP's 512KByte RAM is not implemented, and we expected uDIAG to test that, but it seems not.

One of the runs were done on an Apple Mac-Mini with the M1 ARM CPU, it ran nearly twice as fast as the T14s laptop.

2022-08-14 Unexpectedly good news

  CLI> x run_udiag
  preparing to run the Confidence Test (uDIAG)
  The long version stress tests the DRAMs but runs 2 minutes longer
  Do you want to run the long version [N] ? n
    Loading from file DIAG.M200_UCODE  bound on November 15, 1989  13:02:00
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store ............ [OK]
  the Confidence test (uDIAG) passed
  CLI>
  Begin statistics
    61559.274716 s        Wall Clock Time
       23.083458220 s     SystemC simulation
    1/2666.8              SystemC Simulation ratio
    51981.890124900 s     IOC simulation
    51949.796301 s        IOC stopped
  40179247                IOC instructions
    63148.231 s           User time
     1631.137 s           System time
   109780                 Max RSS
  End statistics

This was the megacomp3 branch of the schematics, git rev 7d92f98c7415251d59fe.

2022-08-12 Indeed exciting news

  […]
  CLEAR_DRIVE_HIT.M32
  RESET.M32
  LOAD_CONFIG.M32
  Phase 3 passed
  Diagnostic execution menu
      1 => Test the foreplane
      2 => Run all tests
      3 => Run all tests applicable to a given FRU (board)
      4 => Run a specific test
      0 => Return to main menu
  Please enter option :

Using the un-optimized, (ie: identical to HW) schematics, this took a bit over five days, 4187 times slower than the real machine.

2022-08-05 Maybe exciting news

We found another 8/0 misreading, and now the tests just keep running.

Until they stop, one way or another, we will not know what the status is, but it looks good-ish.

Update:

run_udiag has run for 28½ hours now, currently toodling around at microaddress 0x26a5…7, which according to the information in the Knowledge Transfer Manual (pdf pg 136) is in the MEM_TEST area.

At the same time, a "FRU" test has been running 37⅓ hours, and have gotten past P2ABUS and currently chugging through P2MM.

It is incredibly boring and incredibly exciting at the same time :-)

Both these machines use the 'main' branch of the schematics, identical to the schematics in the binder we got to Terma.

We expect run_udiag to fail when it gets to SYS_IOC_TEST because the M68K20's RAM is not the same as the SystemC models RAM.

2022-08-02 It's 30% faster to ask a friend…

The video of Michael Druke's presentation from his july 5th visit is now up on YouTube:

<youtube>https://www.youtube.com/watch?v=QFLZCKt-eZs</youtube>

But there were of course more questions to ask than we could get through in a single day.

There is a signal called BUZZ_OFF~ which pops up all over the place like this:

R1000 buzz off.png

The net effect of this signal is to turn a lot of tri-state bus-drivers off during the first quarter ("Q1") of the machine-cycle, but not because another driver needs the bus, since they are also gated by the BUZZ_OFF~ signal.

So why then ?

As Mike explains in the presentation, there are no truly digital signals, they are all analog when you put the scope on them, and he explained in a later email that »The reason for (BUZZ_OFF) is suppressing noise glitches when the busses switch at the beginning of the cycle.«

That makes a lot of sense, once you think about it.

By always putting a bus into "Hi-Z" state between driven periods, the inputs will drain some of the charge away, and the voltage will drift towards the middle from whatever side the bus was driven.

Next time the bus is driven, the driver chips will have to do less work, and it totally eliminates any risk of "shoot-through" if one driver is slow to release while another is fast to drive.

(Are there books with this kind of big-computer HW-design wisdom ?)

Our emulation do use truly digital signals, it is not subject to ground-bounce, reflections, leakage, capacitance and all those pesky physical phenomena, so BUZZ_OFF~ is needlessly triggering a lot of components, twice every single machine-cycle - ten million times per simulated second of time.

Preliminary experiments indicate a 30% speedup without the BUZZ_OFF~ signal, but we need to run the full gamut of tests before we can be sure it is OK.

2022-07-31 P2FP and P2EVNT passes

In both cases trivial typos and misunderstandings.

Next up is P2ABUS which tests the address bus, which takes us into semi-murky territory, including the fidelity of our PAL-to-SystemC conversion.

On a recent tour of the museum, a guest asked why we use the "simulation / real" ratio as our performance metric, and the answer is that that when the performance gap is on the order of thousands, percentages are not very communicative:

Caption text
Machine Branch Ratio Percentage Performance
CI-server main 4173 0.024 -99.976 %
CI-server megacomp2 2380 0.042 -99.958 %
T14s laptop megacomp2 1142 0.088 -99.912 %

But we are getting closer to the magic threshold of "kHz instead of MHz".

2022-07-24 VAL: valeō

  […]
  TEST_Z_CNTR_WALKING.VAL
    Loading from file PHASE2_MULT_TEST.M200_UCODE  bound on July 16, 1986  14:31:44
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  TEST_MULTIPLIER.VAL
  CLEAR_PARITY.VAL
  LOAD_WCS_UIR.VAL
  RESET.VAL
  P2VAL passed

This also means that we are, in some limited amount, able to execute microcode.

2022-07-23 Et tu TYP?

With a similar workaround, the P2TYP test completes.

2022-07-22 Moving along

After some work on the disassembly of the .M200 IOP programs, specifically the P2VAL.M200 program, it transpired that the reason the "COUNTER OVERFLOW" test failed is because the P2VAL program busy-waits for the experiment to complete, and the simulated IOP runs too fast:

  0002077c  PUSHTXT "TEST_LOOP_CNTR_OVERFLOW.VAL"
            [push other arguments for ExpLoad]
  000207a0  JSR ExpLoad(PTR.L, PTR.L)
            [push other arguments for ExpXmit]
  000207ae  JSR ExpXmit(EXP.L, NODE.B)
  000207b6  MOVE.L #-5000,D7
            [push arguments for DiProcPing]
  000207ca  JSR DiProcPing(adr.B, &status.B, &b80,B, &b40.B)
  000207d2  ADDQ.L #0x1,D7
  000207d4  BEQ 0x207de
            [check proper status]
  000207dc  BNE 0x207bc
  000207de  [...]

We need a proper fix for this, preferably something which does not involve slowing the DIAGBUS down all the time.

In the meantime, we can work around the problem by patching the constant -5000 from the CLI:

  dfs patch P2VAL.M200 0x7b8 0xff 0xfe 0xec 0x78

That gets us to:

  […]
  TEST_Z_CNTR_WALKING.VAL   
    Loading from file PHASE2_MULT_TEST.M200_UCODE  bound on July 16, 1986  14:31:44
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  TEST_MULTIPLIER.VAL   
  *** ERROR reported by P2VAL:
  An error in the multiplier logic was detected  (P2VAL).
  Field replaceable units :
          VALUE Board
  *** P2VAL failed ***

Which can either be a problem with the multiplier circuit, which we have not seen activated until now, or failing microcode execution, which we have also not seen much of yet.

The multiplication circuit on the VAL board is quite complex, it takes up a 7 full pages, because the 16x16=>32 multiplier had to be built out of four 8x8=>16 multiplier chips and 4-bit adders to combine their output.

2022-07-17 Lots of cleanup

With all boards passing unit-tests, the next step is to start to execute micro-code, first diagnostic and when that works, the real thing.

Such a juncture is a good opportunity for a bit of cleanup, and this is currently ongoing.

Right now the FRU program errors out with:

  Running FRU P2VAL
  TEST_LOOP_CNTR_OVERFLOW.VAL*** ERROR reported by P2VAL:
  VAL LOOP COUNTER overflow does not work correctly (P2VAL).

Getting to the point of failure takes 5 hours on our fastest machine (at 1/1300 speed ratio with all boards), but if we tell FRU to run P2VAL directly, it instead launches P2FP instead, which after some unknown micro-instructions have executed, fails with a generic error message (see previous entry.)

2022-07-05 Mike Druke visits

Today Mike Druke and his wife finally to visit us, this was yet another much anticipated event rudely postponed by Covid-19.

We showed Mike a running R1000 machine, in this case PAM's machine, but using the IOC board from the Terma machine, we also toured our little exhibition, for the occation augmented with a Nova2 computer from the magazines and on the way to lunch we stopped to demo our 50+ year old GIER computer.

In the afternoon Mike gave a wondeful talk about Rational, the people, the company, the ideas and the computers.

The video recording from Mike's talk will go into our bit-archive and be posted online, when the post-processing is finished.

Work on the emulator continues and has reached the major milestone where microcode is being executed:

   Running FRU P2FP
     Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
     Loading Register Files and Dispatch Rams .... [OK]
     Loading Control Store  [OK]
   *** ERROR reported by P2FP:
   ABORT -> uCODE DID NOT HALT AT CORRECT ADDRESS

Now we need to figure out what the diagnostic microcode was supposed to do and once we understand that, figure out why it did not.

2022-07-02 FRU and DFS hacking

Going forward, the FRU program is going to be our primary test-driver, and the emulation already passes phase-1, which seems to more or less consist of the same experiments as the TEST_$BOARD.EM scripts.

The first test which fails in phase-2 is the attempt to make the request-FIFO on the IOC generate an interrupt, and that is understandable, because that part of the SystemC code is not hooked up to the MC68K20 emulation.

But in order to get to that point the P2IOC test spends some hours on other experiments, and because FRU expects all boards to be "plugged in", and that is still pretty slow.

That catapulted an old entry from the TODO list to the top, so now the emulation has a "dfs" cli command, which allows reading, writing, editing (with sed(1)) of files in the DFS filesystem, and a special subcommand "dfs neuter" to turn an experiment into a successful no-op.

With that in place, and when neutering eight experiments, it only takes a couple of minutes to get to the WRITE_REQUEST_QUEUE_FIFO.IOC experiment.

When run individually the P2FIU, P2SEQ, P2MEM, P2STOP, P2COND and P2CSA tests all seem to pass.

The P2TYP and P2VAL tests both fail on "LOOP COUNTER overflow does not work correctly", which sounds simple, and P2EVNT fails with "The signal for the GP_TIMER event is stuck low on the backplane" which may simply be because the IOP cannot read the STATUS register yet.

So all in all, no unexpected or unexpectedly bad news from FRU … yet.

2022-06-28 +83% more running R1000 computers

Today we transplanted the IOC and PSU from Terma's R1000 to PAM's R1000, slotted in a SCSI2SD and powered it up.

There were a fair number of intermediate steps, transport, adapting power-cables, swapping two PAL-chips that had gotten swapped after the readout etc. etc.

But the important thing is that it came up.

That means that we "just" need to get RAM working on one of the two spare IOCs we have, and one way or another, get a power-supply going, then the world will have two running R1000 computers, instead of just one.

2022-06-20 IOP fined for speeding

The error from the SEQ board transpired to be the IOP downloading data faster than the DIPROC could get them stuffed into the SystemC model.

In difference from when normal experiments are run, when downloading the IOP just blasts bytes down the DIAGBUS, as fast as it can, and by interleaving downloads to multiple boards, for instance {SEQ, TYP, SEQ, VAL}… the DIPROCs get enough time to do their thing.

If we had tied the 68K20 emulation, the DIAGBUS and the DIPROCs to the SystemC clock at all times, that would just work, but it would also be a lot slower.

So we cheat: The 68K20 emulation and the i8052 emulation of the DIPROC runs asynchronous to the SystemC model, only synchronizing when it is needed to perform a bus-transaction, and the DIAGBUS has infinite baud-rate.

Therefore we have added a [small hack] to delay DOWNLOAD commands from the IOP if the targeted DIPROC is still in RUNNING state.

Now the FPTEST starts running, and comes back with:

  CLI> fptest
    Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  VAL  bad FIU bits = FFFF_FFFF_FFFF_FFFF
  TYP  bad TYP bits = FFFF_FFFF_FFFF_FFFF
  VAL  bad VAL bits = FFFF_FFFF_FFFF_FFFF
  TEST AGAIN [Y] ?

Which is an improvement.

However, it is not obvious to us that FPTEST is what we should be attempting now.

The FPTEST.CLI script contains:

  x rdiag fptest;

That makes RDIAG.M200 interpret FPTEST.DIAG, which contains:

  init_state;push p2fp interactive;

And to us "p2" sounds a lot like "phase two".

There is another script to RDIAG called GENERIC.DIAG which looks like a comprehensive test:

  init_state;
  run all p1dcomm;
  [#eq,[model],100]
  run p1sys;
  [end]
  [#eq,[model],200]
  run p1ioc;
  [end]
  run p1val;
  run p1typ;
  run p1seq;
  run p1fiu;
  run allmem p1mem;
  run all p1sf;
  init_state;
  [#eq,[model],100]
  run p2ioa;
  [end]
  [#eq,[model],200]
  run p2ioc;
  [end]
  [#eq,[model],100]
  run p2sys;
  [end]
  run p2val;
  run p2typ;
  run p2seq;
  run p2fiu;
  run allmem p2mem;
  init_state;
  run p2uadr;
  run p2fp;
  run p2evnt;
  run p2stop;
  run p2abus;
  run p2csa;
  run p2mm;
  [#eq,[model],100]
  run p2sbus;
  [end]
  run p2cond;
  run all p2ucode;
  run all p3rams;
  run all p3ucode

Running that instead we get:

  CLI> x rdiag generic
  Running FRU P1DCOMM
  Running FRU P1DCOMM
  P1DCOMM Failed
  The test that found the failure was P1DCOMM
  
  ONE_BOARD_FAILED_HARD_RESET
  Field replaceable units : 
          Backplane / Backplane Connector
          All Memory Boards
  Diagnostic Failed

That looks actionable...

2022-06-19 First attempt at FPTEST

With all boards passing their unit-tests, next step is the FPTEST.

Until now the 68K20 emulator's only contact with the SystemC code has been through the asynchronous DIAGBUS, but one of the first thing FPTEST does is to reset the R1000 processor, and therefore we had to implement the SystemC model of the 68K20, so it can initiate write cycles to DREG4 at address 0xfffffe00.

That got us this far:

   CLI> fptest
     Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
     Loading Register Files and Dispatch Rams ....
   Experiment error :
   Board      : Sequencer
   Experiment : LOAD_DISPATCH_RAMS_200.SEQ
   Status     : Error

   Abort : Experiment error
   Fatal experiment error.
   From DBUSULOAD

   Abort : Experiment error
   Fatal experiment error.
   From P2FP
   CLI>

2022-06-04 All boards pass unit test

Fixing two timing problems in the simulation made the TEST_MEM32.EM pass, and with that we have zeros in the entire right hand column in the table above.

2022-05-29 SEQ passes unit test

Have we mentioned zero vs eight confusion in the schematics yet ?

Final 08 seq.png

And with that, the emulated SEQ passes TEST_SEQ.EM

Now we just need to track down the final problems with MEM32.

2022-05-21 Watching the grass grow

Spring has slowed down work on the R1000 Emulator, but some progress is being made.

The SEQ board is now down to only two failing subtests:

   RESOLVE_RAM_(OFFSET_PART)_TEST                                       FAILED 
   TOS_REGISTER_TEST_4                                                  FAILED

or rather, all the other errors where phantom failures due to two colliding optimizations, one by Rational engineers and one by us:

   125c 93           |    |            MOVC    A,@A+DPTR
   125d b4 ff f1     |    |            CJNE    A,#0xff,0x1251
   1260 74 02        |t   |            MOV     A,#0x02
   1262 f2           |    |            MOVX    @R0,A
   1263 08           |    |            INC     R0
   1264 02 05 1c     |    |            LJMP    EXECUTE

The above is a snippet of the DIPROC(1) code, the end of a loop used extensively on the SEQ board.

The Rational optimization is the instruction at 0x1262, which we think initiates a reset of the Diagnostic FSM.

Normally, the INC,LJMP and the instructions which pick up and decodes the next bytecode-instruction would leave the FSM plenty of time to get things done, but since our emulated DIPROC excutes all non-I/O instructions instantly (See: [[1]]) some of the SEQ testcases, notably LATCHED_STACK_BIT_1_FRU.SEQ would fail.

The failure mode was that the bytecode expected to read a pattern like "ABAABB" from the hardware, but would get "CABAAB", which sent us on wild goose-chase for non-existent clock-skew problems.

Have we mentioned before that one should never optimize until things actually work ?

2022-05-08 Slowly making way

As can be seen in the table above, the main DRAM array now works on the emulated MEM32 board.

It takes 48 hours to run that test, because the entire DRAM array is tested 16 times, very comprehensively:

  TESTING TILE  4 -  TILE_MEM32_DATA_STORE

  DYNAMIC RAM DATA PATH TEST                                 PASSED
  DYNAMIC RAMS ADDRESS TEST                                  PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 0                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 1                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 2                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 3                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 4                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 5                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 6                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 7                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 0                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 1                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 2                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 3                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 4                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 5                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 6                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 7                        PASSED

  TILE  4 -  TILE_MEM32_DATA_STORE                           PASSED


While "FAILURE" is printed five times on the console, there is actually only two failing experiments:

  TESTING TILE  3 -  TILE_MEM32_TAGSTORE

  TAGSTORE SHORTS/STUCK-ATS TEST                             PASSED
  TAGSTORE ADDRESS PATTERN TEST                              PASSED
  TAGSTORE PARITY TEST1                                      PASSED
  TAGSTORE PARITY TEST2                                                FAILED

            FAILING EXPERIMENT IS :  TEST_TAGSTORE_PARITY_2

  TAGSTORE RAMS ZERO TEST                                    PASSED
  TAGSTORE RAMS ONES TEST                                    PASSED
  LRU UPDATE TEST                                                      FAILED
            FAILING EXPERIMENT IS :  TEST_LRU_UPDATE
  
  TILE  3 -  TILE_MEM32_TAGSTORE                                       FAILED

Despite some effort, we have still not figured out what the problem is. We suspect a timing issue near or with the tag-RAM.

2022-04-16 A long overdue update

As can be seen in the table above, the simulated SEQ board is down to 12 FAILURE messages, and what the table does not show is that the MEM32 board simulation completes now, but takes more than 24 hour to do, and which makes the daily CI cron(8) job fail catastrophically.

The bug which have taken us almost a month to fix turned out to be the i8052 emulator's CLC C, Complement Carry, instruction not complementing, in a DIPROC bytecode-instruction we had not previously encountered: Calculate Even/Odd parity for a multi-byte word.

Along the way we have attended to much other stuff, tracing, python code for decoding scan-chains, "mega components" etc. and, notably, python generated component SystemC models.

Initially all 12 thousand electrical networks in the simulated part of the system were a sc_signal_resolved instance.

Sc_signal_resolved is the most general signal type in SystemC, having four possible levels, '0', '1', 'Z' and 'X' and allowing multiple 'writers', but it is therefore also the slowest.

Migrating to faster types, bool for single wire binary networks and uint%d_t for single-driver binary busses, requires component models for all the combinations we may encounter, and writing those by hand got old really fast.

For true Tri-state signals, we will still need to use the sc_signal_resolved type, but a lot of Tri-state output chips are used as binary drivers, by tying their OE pin to ground, so relying on the type of a component to tell us what type its output has misses a lot of optimization opportunities.

And thus we now have Python "models" of components, which automatically produce adapted SystemC component models.

Here is an example of the 2149 SRAM model:

class SRAM2149(PartFactory):

    ''' 2149 CMOS Static RAM 1024 x 4 bit '''

    def state(self, file):
        file.fmt('''
                |       uint8_t ram[1024];
                |       bool writing;
                |''')

    def sensitive(self):
        for node in self.comp:
            if node.pin.name[0] != 'D' and not node.net.is_const():
                yield "PIN_" + node.pin.name

    def doit(self, file):
        ''' The meat of the doit() function '''

        super().doit(file)

        file.fmt('''
                |       unsigned adr = 0;
                |
                |       BUS_A_READ(adr);
                |       if (state->writing)
                |               BUS_DQ_READ(state->ram[adr]);
                |
                |''')

        if not self.comp.nodes["CS"].net.is_pd():
            file.fmt('''
                |       if (PIN_CS=>) {
                |               TRACE(<< "z");
                |               BUS_DQ_Z();
                |               next_trigger(PIN_CS.negedge_event());
                |               state->writing = false;
                |               return;
                |       }
                |''')

        file.fmt('''
                |
                |
                |       if (!PIN_WE=>) {
                |               BUS_DQ_Z();
                |               state->writing = true;
                |       } else {
                |               state->writing = false;
                |               BUS_DQ_WRITE(state->ram[adr]);
                |       }
                |       TRACE(
                |           << " cs " << PIN_CS?
                |           << " we " << PIN_WE?
                |           << " a " << BUS_A_TRACE()
                |           << " dq " << BUS_DQ_TRACE()
                |           << " | "
                |           << std::hex << adr
                |           << " "
                |           << std::hex << (unsigned)(state->ram[adr])
                |       );
                |''')

Notice how the code to put the output in high-impedance "3-state" mode is only produced if the chip's CS pin which is not pulled down.

Note also that the code handles the address bus and data bus as a unit, by calling C++ macros generated by common python code. This allows the same component model to be used for wider "megacomp" variants of the components.

This is particularly important for the MEM32 board, which has 64(Type)+64(Value)+9(Ecc) DRAM chips in each of the two memory banks. The simulation runs much faster with just two "1MX64" and one "1MX9" components, than it does with 137 "1MX1" components in each bank.

This optimization is what disabused us of the notion that the CHECK_MEMORY_ONES.M32 experiment hung, it did not, it just took several hours to run - and it is run once for each of the eight "sets" of memory.

With the current 11 failures, the entire MEM32 test takes 140 seconds of simulated time, 7½ hours in our fastest "megacomp2" version of the schematics on our fastest machine.

However our "CI" machine is somewhat slower, and runs the un-optimized "main" version of the schematics, which means the next daily "CI" run is started before the previous one completed, and with them using the same filenames, they both crash.

So despite the world distracting us with actual work, travel, talks, social events, and notably the first ever opening of Datamuseum.dk for the public, we are still making good progress.

2022-03-06 Do not optimize until it works, unless …

It is very old wisdom in computing that it does not matter how fast you can make a program which does not work, and usually we stick firmly to that wisdom.

However, there are exceptions, and the R1000-emulator is one of them.

When the computer was designed, the abstract architecture had to be implemented with the available chips in the 74Sxx and later 74Fxx families of TTL chips, and there being no 64 bit buffers in those families, a buffer for one of the busses was decomposed into 8 parallel 8 bit busses, each running through a 74F245 chip, etc.

In hardware the 8 chips operate in parallel, in software, at least with SystemC, they are sequential, so there is a performance impact.

What is worse, there is a debugging impact as well, because instead of the trace-file telling what the state of the 64 bits are, in a single line, it contains eight lines of 8 bits, in random order.

Therefore we operate with three branches in the R1000.HwDoc github repository: "main", "optimized" and "megacomp".

"Main" is the schematics as they are on paper. That is the branch reported in the table above.

"Optimized" is primarily deduplication of multi-buffered signals, that is signals where multiple outputs in parallel are required to drive all the inputs of that signal, a canonical example being the address lines of the DRAM chips on the MEM32 board.

Finally in "megacomp" we invent our own chips, like a 64 bit version of the 74F245, whereby we both improve the clarity of the schematics, and make the simulation run faster, almost twice as fast as "main" at this point.

Here is the same table as above, for the "megacomp" branch, and run on the fastest CPU currently available to this project:

Test Wall Clock SystemC Ratio Exp run Exp fail
expmon_reset_all 51.787 0.026151 1/1980.3 0 0
expmon_test_fiu 1275.507 17.799928 1/71.7 95 0
expmon_test_ioc 1018.086 11.231571 1/90.6 29 0
expmon_test_mem32 5331.993 30.000000 1/177.7 28 9
expmon_test_seq 1183.407 13.081077 1/90.5 108 32
expmon_test_typ 3629.642 7.468383 1/486.0 73 2
expmon_test_val 3625.022 7.434761 1/487.6 66 0
novram 69.302 0.035584 1/1947.6 0 0

Note that the megacomponents has caused one of the TYP tests to fail, so the old wisdom does apply after all. (The table shows two failures because both the individual test and the entire test-script reports "FAILED" on the console.)

2022-03-05 We will not need to emulate the ENP-100

The R1000/s400 has two ethernet interfaces, one is on the internal IO/bus and can be directly accessed by the IOC and, presumably, the R1000 CPU, the other is on a VME daughter-board, mounted on the RESHA board.

Strangely enough, the TCP/IP protocol only seems to be supported on the latter, whereas the "direct" ethernet port is for use only in cluster configurations.

The VME board is a "ENP-100" from Communication Machinery Corp. of 125 Cremona Drive, Santa Barbara, CA 93117".

R1000 enp100.jpg

The board contains a full 12.5 MHz 68k20 computer, including boot-code EPROMs, 512K RAM, two serial ports and a Ethernet interface.

The firmware for this board is downloaded from the R1000 CPU, and implements a TCP/IP stack, including TELNET and FTP services.

Interestingly, the TCP/IP implementations ignores all packets with IP or TCP options, so no contemporary computers can talk with it, until "modern" options are disabled.

We have no hardware documentation for the ENP-100 board, but we expect emulation is feasible, given enough time and effort.

Fortunately it seems the R1000 can boot without the ENP-100 board, it complains a bit, but it boots.

That takes emulation of the ENP-100 out of the critical path, and makes it optional to even do it.

2022-03-01 The fish will appreciate this

Below the R1000 two genuine and quite beefy Papst fans blow cooling air up through the card-cage.

For a machine which most likely will end up in a raised-floor computing room, it can be done no other way.

However, if the machine is housed anywhere else, an air-filter is required to not suck all sorts of crap into the electronics.

And of course, air-filters should be maintained, so we pulled out the fan-tray and found that the filter mat was rapidly deteriorating, literally falling apart.

Not being air-cooling specialists, we initially ordered normal fan-filters, the kind that looks like loose felt made of plastic, but the exhaust temperature on the top of the machine climbed to over 54°C.

So what was the original filter material, and where could we buy it?

It looks a lot like the material used on the front of the obscure but deservedly renowned concrete Rauna Njord speakers, designed by Bo Hansson in the early 1980'ies, and that material also fell to pieces after about a decade.

Surfing fora where vintage hifi-nerds have restored Rauna Njord we found "PPI 10 Polyureathane foam" mentioned, and that transpires to be a what water-filters for aquariums are made from.

A trip to the local aquarium shop got us a 50x50x3cm sheet of filter material, and the promise that the fish will really appreciate us buying it.

We cut a 12.5cm wide stripe and parted it lengthwise in two slices of roughly equal thickness, using two wooden strips as guides and a sharp bread-knife.

It is almost too bad that one cannot see the sporty blue color when it is mounted below the R1000:

Luftfilter1.png

The two small pieces in the middle is the largest fragment of the old air filter and an off-cut from the 17mm thick slice:

Luftfilter2.png

In the spirit of scientific inquiry, we measured the temperature with both thicknesses.

With the 17mm thick filter, the exhaust to rose to above 52°C.

With the 13mm thick filter, it stabilized around 41°C.

That is a pretty convincing demonstration of the conventional wisdom, that axial fans should push air, not pull it.

So why are the filters on the "pull" side of the fans in the R1000, when the fan-tray is plenty deep for filters to be mounted on the "push" side?

Maybe this is an after-market modification, trying to convert an unfiltered "data-center fan-tray" into a filtered "office fan-tray" ?

2022-02-20 TEST_VAL.EM passes

We're making progress.

Now we are going focus on TYP, where most of the failing tests have something to do with parity checking.

2022-02-12 TEST_FIU.EM passes

As can be seen on the status above, there are no longer any failures on the FIU board when running TEST_FIU.EM.

The speed in the table above is when simulating the unadultered schematics.

Concurrent with fixing bugs we are working on two levels of optimized schematics, one where buffered signals are deduplicated, and one where we use "megacomponents", for instance 64 bit wide versions of 74F240 etc.

The "megacomp" version of FIU runs twice as fast, 1.4% of hardware speed.

2022-01-10 SystemC performance is weird

As often aluded to, the performance of a SystemC simulation is … not ideal … from a usability point of view, so we spend a lot of time thinking about it and measuring it, and it is not at all intuitive for software people like us.

Take the following (redrawn) sheet from the IOC schematic (click for full size)

IOC RESPONSE FIFO

This is the 2048x16 bit FIFO buffer through which the IOC sends replies the the R1000 CPU.

None of the tests in the "TEST_IOC.EM" file gets anywhere near this FIFO, yet the simulation runs 15% faster if this sheet is commented out, because this sheet uses a lot of free-running clocks:

   1 * 2X~     @ 20 MHz        20 MHz
   2 * H2.PHD  @ 10 MHz        20 MHz
   2 * H1E     @ 10 MHz        20 MHz
   1 * H2E     @ 10 MHz        10 MHz
   1 * Q1~      @ 5 MHz         5 MHz
   2 * Q2~      @ 5 MHz        10 MHz
   1 * Q3~      @ 5 MHz         5 MHz
   ----------------------------------
   Simulation load             90 MHz

Where the clocks feed into edge sensitive chip, for instance "Q1~" to "FOREG0" (left center), only one of the flanks need to be simulated, but when state sensitive gates like "2X~" into "FFNAN0A" (near the top), the "FOO" class instance is called for both flanks, effectively doubling the frequency of the 10MHz clock signal.

To make matters even worse, there is an identical FIFO feeding requests the opposite way, from R1000 to IOC, on the next sheet.

And to really drive the point home, all the simulation runs will have to include the IOC board.

In SystemC a FIFO is one of the primitive objects, which can simulate these two pages much faster than this, but to do that we need enough of the machine simulated well enough, to run the experiments which tests the FIFOs.

Until then, we can save oceans of time by simply commenting these two FIFOs out.

2022-01-08 Making headway with FIU

We are making headway with the simulated FIU board, currently 19 tests fail, the 16 "Execute from WCS" and three parity-related tests. We hope the 16 WCS tests have a common cause.

On the FIU we have found the first test-case which depends on undocumented behaviour: TEST_ABUS_PARITY.FIU fails if OFFREG does not have even parity when the test starts.

Simulating the IOC and FIU boards, the simulation currently clocks around 1/380 of hardware speed, if the TYP, VAL and SEQ boards are also simulated, speed drops to 1/3000 of hardware speed. Not bad, not certainly not good.

We have started playing with "mega-symbols" for instance 64bit versions of the 74F240, 74F244 and 74F374. There is a speed advantage, but the major advantage right now is that debugging operates on the entire bus-width at the same time.

2021…2012

  • 2014-2018 - The project got stuck for lack of a sufficiently beefy 5V power-supply, and then phk disappeared while he built a new house.

Many thanks to

  • Erlo Haugen
  • Grady Booch
  • Grek Bek
  • Pierre-Alain Muller
  • Michael Druke
  • Pascal Leroy