Rational/R1000s400/Logbook

Fra DDHFwiki
Spring til navigation Spring til søgning

2022-08-05 Maybe exciting news

We found another 8/0 misreading, and now the tests just keep running.

Until they stop, one way or another, we will not know what the status is, but it looks good-ish.

Update:

run_udiag has run for 28½ hours now, currently toodling around at microaddress 0x26a5…7, which according to the information in the Knowledge Transfer Manual (pdf pg 136) is in the MEM_TEST area.

At the same time, a "FRU" test has been running 37⅓ hours, and have gotten past P2ABUS and currently chugging through P2MM.

It is incredibly boring and incredibly exciting at the same time :-)

Both these machines use the 'main' branch of the schematics, identical to the schematics in the binder we got to Terma.

We expect run_udiag to fail when it gets to SYS_IOC_TEST because the M68K20's RAM is not the same as the SystemC models RAM.

2022-08-02 It's 30% faster to ask a friend…

The video of Michael Druke's presentation from his july 5th visit is now up on YouTube:

But there were of course more questions to ask than we could get through in a single day.

There is a signal called BUZZ_OFF~ which pops up all over the place like this:

R1000 buzz off.png

The net effect of this signal is to turn a lot of tri-state bus-drivers off during the first quarter ("Q1") of the machine-cycle, but not because another driver needs the bus, since they are also gated by the BUZZ_OFF~ signal.

So why then ?

As Mike explains in the presentation, there are no truly digital signals, they are all analog when you put the scope on them, and he explained in a later email that »The reason for (BUZZ_OFF) is suppressing noise glitches when the busses switch at the beginning of the cycle.«

That makes a lot of sense, once you think about it.

By always putting a bus into "Hi-Z" state between driven periods, the inputs will drain some of the charge away, and the voltage will drift towards the middle from whatever side the bus was driven.

Next time the bus is driven, the driver chips will have to do less work, and it totally eliminates any risk of "shoot-through" if one driver is slow to release while another is fast to drive.

(Are there books with this kind of big-computer HW-design wisdom ?)

Our emulation do use truly digital signals, it is not subject to ground-bounce, reflections, leakage, capacitance and all those pesky physical phenomena, so BUZZ_OFF~ is needlessly triggering a lot of components, twice every single machine-cycle - ten million times per simulated second of time.

Preliminary experiments indicate a 30% speedup without the BUZZ_OFF~ signal, but we need to run the full gamut of tests before we can be sure it is OK.

2022-07-31 P2FP and P2EVNT passes

In both cases trivial typos and misunderstandings.

Next up is P2ABUS which tests the address bus, which takes us into semi-murky territory, including the fidelity of our PAL-to-SystemC conversion.

On a recent tour of the museum, a guest asked why we use the "simulation / real" ratio as our performance metric, and the answer is that that when the performance gap is on the order of thousands, percentages are not very communicative:

Caption text
Machine Branch Ratio Percentage Performance
CI-server main 4173 0.024 -99.976 %
CI-server megacomp2 2380 0.042 -99.958 %
T14s laptop megacomp2 1142 0.088 -99.912 %

But we are getting closer to the magic threshold of "kHz instead of MHz".

2022-07-24 VAL: valeō

  […]
  TEST_Z_CNTR_WALKING.VAL
    Loading from file PHASE2_MULT_TEST.M200_UCODE  bound on July 16, 1986  14:31:44
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  TEST_MULTIPLIER.VAL
  CLEAR_PARITY.VAL
  LOAD_WCS_UIR.VAL
  RESET.VAL
  P2VAL passed

This also means that we are, in some limited amount, able to execute microcode.

2022-07-23 Et tu TYP?

With a similar workaround, the P2TYP test completes.

2022-07-22 Moving along

After some work on the disassembly of the .M200 IOP programs, specifically the P2VAL.M200 program, it transpired that the reason the "COUNTER OVERFLOW" test failed is because the P2VAL program busy-waits for the experiment to complete, and the simulated IOP runs too fast:

  0002077c  PUSHTXT "TEST_LOOP_CNTR_OVERFLOW.VAL"
            [push other arguments for ExpLoad]
  000207a0  JSR ExpLoad(PTR.L, PTR.L)
            [push other arguments for ExpXmit]
  000207ae  JSR ExpXmit(EXP.L, NODE.B)
  000207b6  MOVE.L #-5000,D7
            [push arguments for DiProcPing]
  000207ca  JSR DiProcPing(adr.B, &status.B, &b80,B, &b40.B)
  000207d2  ADDQ.L #0x1,D7
  000207d4  BEQ 0x207de
            [check proper status]
  000207dc  BNE 0x207bc
  000207de  [...]

We need a proper fix for this, preferably something which does not involve slowing the DIAGBUS down all the time.

In the meantime, we can work around the problem by patching the constant -5000 from the CLI:

  dfs patch P2VAL.M200 0x7b8 0xff 0xfe 0xec 0x78

That gets us to:

  […]
  TEST_Z_CNTR_WALKING.VAL   
    Loading from file PHASE2_MULT_TEST.M200_UCODE  bound on July 16, 1986  14:31:44
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  TEST_MULTIPLIER.VAL   
  *** ERROR reported by P2VAL:
  An error in the multiplier logic was detected  (P2VAL).
  Field replaceable units :
          VALUE Board
  *** P2VAL failed ***

Which can either be a problem with the multiplier circuit, which we have not seen activated until now, or failing microcode execution, which we have also not seen much of yet.

The multiplication circuit on the VAL board is quite complex, it takes up a 7 full pages, because the 16x16=>32 multiplier had to be built out of four 8x8=>16 multiplier chips and 4-bit adders to combine their output.

2022-07-17 Lots of cleanup

With all boards passing unit-tests, the next step is to start to execute micro-code, first diagnostic and when that works, the real thing.

Such a juncture is a good opportunity for a bit of cleanup, and this is currently ongoing.

Right now the FRU program errors out with:

  Running FRU P2VAL
  TEST_LOOP_CNTR_OVERFLOW.VAL*** ERROR reported by P2VAL:
  VAL LOOP COUNTER overflow does not work correctly (P2VAL).

Getting to the point of failure takes 5 hours on our fastest machine (at 1/1300 speed ratio with all boards), but if we tell FRU to run P2VAL directly, it instead launches P2FP instead, which after some unknown micro-instructions have executed, fails with a generic error message (see previous entry.)

2022-07-05 Mike Druke visits

Today Mike Druke and his wife finally to visit us, this was yet another much anticipated event rudely postponed by Covid-19.

We showed Mike a running R1000 machine, in this case PAM's machine, but using the IOC board from the Terma machine, we also toured our little exhibition, for the occation augmented with a Nova2 computer from the magazines and on the way to lunch we stopped to demo our 50+ year old GIER computer.

In the afternoon Mike gave a wondeful talk about Rational, the people, the company, the ideas and the computers.

The video recording from Mike's talk will go into our bit-archive and be posted online, when the post-processing is finished.

Work on the emulator continues and has reached the major milestone where microcode is being executed:

   Running FRU P2FP
     Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
     Loading Register Files and Dispatch Rams .... [OK]
     Loading Control Store  [OK]
   *** ERROR reported by P2FP:
   ABORT -> uCODE DID NOT HALT AT CORRECT ADDRESS

Now we need to figure out what the diagnostic microcode was supposed to do and once we understand that, figure out why it did not.

2022-07-02 FRU and DFS hacking

Going forward, the FRU program is going to be our primary test-driver, and the emulation already passes phase-1, which seems to more or less consist of the same experiments as the TEST_$BOARD.EM scripts.

The first test which fails in phase-2 is the attempt to make the request-FIFO on the IOC generate an interrupt, and that is understandable, because that part of the SystemC code is not hooked up to the MC68K20 emulation.

But in order to get to that point the P2IOC test spends some hours on other experiments, and because FRU expects all boards to be "plugged in", and that is still pretty slow.

That catapulted an old entry from the TODO list to the top, so now the emulation has a "dfs" cli command, which allows reading, writing, editing (with sed(1)) of files in the DFS filesystem, and a special subcommand "dfs neuter" to turn an experiment into a successful no-op.

With that in place, and when neutering eight experiments, it only takes a couple of minutes to get to the WRITE_REQUEST_QUEUE_FIFO.IOC experiment.

When run individually the P2FIU, P2SEQ, P2MEM, P2STOP, P2COND and P2CSA tests all seem to pass.

The P2TYP and P2VAL tests both fail on "LOOP COUNTER overflow does not work correctly", which sounds simple, and P2EVNT fails with "The signal for the GP_TIMER event is stuck low on the backplane" which may simply be because the IOP cannot read the STATUS register yet.

So all in all, no unexpected or unexpectedly bad news from FRU … yet.

2022-06-28 +83% more running R1000 computers

Today we transplanted the IOC and PSU from Terma's R1000 to PAM's R1000, slotted in a SCSI2SD and powered it up.

There were a fair number of intermediate steps, transport, adapting power-cables, swapping two PAL-chips that had gotten swapped after the readout etc. etc.

But the important thing is that it came up.

That means that we "just" need to get RAM working on one of the two spare IOCs we have, and one way or another, get a power-supply going, then the world will have two running R1000 computers, instead of just one.

2022-06-20 IOP fined for speeding

The error from the SEQ board transpired to be the IOP downloading data faster than the DIPROC could get them stuffed into the SystemC model.

In difference from when normal experiments are run, when downloading the IOP just blasts bytes down the DIAGBUS, as fast as it can, and by interleaving downloads to multiple boards, for instance {SEQ, TYP, SEQ, VAL}… the DIPROCs get enough time to do their thing.

If we had tied the 68K20 emulation, the DIAGBUS and the DIPROCs to the SystemC clock at all times, that would just work, but it would also be a lot slower.

So we cheat: The 68K20 emulation and the i8052 emulation of the DIPROC runs asynchronous to the SystemC model, only synchronizing when it is needed to perform a bus-transaction, and the DIAGBUS has infinite baud-rate.

Therefore we have added a [small hack] to delay DOWNLOAD commands from the IOP if the targeted DIPROC is still in RUNNING state.

Now the FPTEST starts running, and comes back with:

  CLI> fptest
    Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  VAL  bad FIU bits = FFFF_FFFF_FFFF_FFFF
  TYP  bad TYP bits = FFFF_FFFF_FFFF_FFFF
  VAL  bad VAL bits = FFFF_FFFF_FFFF_FFFF
  TEST AGAIN [Y] ?

Which is an improvement.

However, it is not obvious to us that FPTEST is what we should be attempting now.

The FPTEST.CLI script contains:

  x rdiag fptest;

That makes RDIAG.M200 interpret FPTEST.DIAG, which contains:

  init_state;push p2fp interactive;

And to us "p2" sounds a lot like "phase two".

There is another script to RDIAG called GENERIC.DIAG which looks like a comprehensive test:

  init_state;
  run all p1dcomm;
  [#eq,[model],100]
  run p1sys;
  [end]
  [#eq,[model],200]
  run p1ioc;
  [end]
  run p1val;
  run p1typ;
  run p1seq;
  run p1fiu;
  run allmem p1mem;
  run all p1sf;
  init_state;
  [#eq,[model],100]
  run p2ioa;
  [end]
  [#eq,[model],200]
  run p2ioc;
  [end]
  [#eq,[model],100]
  run p2sys;
  [end]
  run p2val;
  run p2typ;
  run p2seq;
  run p2fiu;
  run allmem p2mem;
  init_state;
  run p2uadr;
  run p2fp;
  run p2evnt;
  run p2stop;
  run p2abus;
  run p2csa;
  run p2mm;
  [#eq,[model],100]
  run p2sbus;
  [end]
  run p2cond;
  run all p2ucode;
  run all p3rams;
  run all p3ucode

Running that instead we get:

  CLI> x rdiag generic
  Running FRU P1DCOMM
  Running FRU P1DCOMM
  P1DCOMM Failed
  The test that found the failure was P1DCOMM
  
  ONE_BOARD_FAILED_HARD_RESET
  Field replaceable units : 
          Backplane / Backplane Connector
          All Memory Boards
  Diagnostic Failed

That looks actionable...

2022-06-19 First attempt at FPTEST

With all boards passing their unit-tests, next step is the FPTEST.

Until now the 68K20 emulator's only contact with the SystemC code has been through the asynchronous DIAGBUS, but one of the first thing FPTEST does is to reset the R1000 processor, and therefore we had to implement the SystemC model of the 68K20, so it can initiate write cycles to DREG4 at address 0xfffffe00.

That got us this far:

   CLI> fptest
     Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
     Loading Register Files and Dispatch Rams ....
   Experiment error :
   Board      : Sequencer
   Experiment : LOAD_DISPATCH_RAMS_200.SEQ
   Status     : Error

   Abort : Experiment error
   Fatal experiment error.
   From DBUSULOAD

   Abort : Experiment error
   Fatal experiment error.
   From P2FP
   CLI>

2022-06-04 All boards pass unit test

Fixing two timing problems in the simulation made the TEST_MEM32.EM pass, and with that we have zeros in the entire right hand column in the table above.

2022-05-29 SEQ passes unit test

Have we mentioned zero vs eight confusion in the schematics yet ?

Final 08 seq.png

And with that, the emulated SEQ passes TEST_SEQ.EM

Now we just need to track down the final problems with MEM32.

2022-05-21 Watching the grass grow

Spring has slowed down work on the R1000 Emulator, but some progress is being made.

The SEQ board is now down to only two failing subtests:

   RESOLVE_RAM_(OFFSET_PART)_TEST                                       FAILED 
   TOS_REGISTER_TEST_4                                                  FAILED

or rather, all the other errors where phantom failures due to two colliding optimizations, one by Rational engineers and one by us:

   125c 93           |    |            MOVC    A,@A+DPTR
   125d b4 ff f1     |    |            CJNE    A,#0xff,0x1251
   1260 74 02        |t   |            MOV     A,#0x02
   1262 f2           |    |            MOVX    @R0,A
   1263 08           |    |            INC     R0
   1264 02 05 1c     |    |            LJMP    EXECUTE

The above is a snippet of the DIPROC(1) code, the end of a loop used extensively on the SEQ board.

The Rational optimization is the instruction at 0x1262, which we think initiates a reset of the Diagnostic FSM.

Normally, the INC,LJMP and the instructions which pick up and decodes the next bytecode-instruction would leave the FSM plenty of time to get things done, but since our emulated DIPROC excutes all non-I/O instructions instantly (See: [[1]]) some of the SEQ testcases, notably LATCHED_STACK_BIT_1_FRU.SEQ would fail.

The failure mode was that the bytecode expected to read a pattern like "ABAABB" from the hardware, but would get "CABAAB", which sent us on wild goose-chase for non-existent clock-skew problems.

Have we mentioned before that one should never optimize until things actually work ?

2022-05-08 Slowly making way

As can be seen in the table above, the main DRAM array now works on the emulated MEM32 board.

It takes 48 hours to run that test, because the entire DRAM array is tested 16 times, very comprehensively:

  TESTING TILE  4 -  TILE_MEM32_DATA_STORE

  DYNAMIC RAM DATA PATH TEST                                 PASSED
  DYNAMIC RAMS ADDRESS TEST                                  PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 0                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 1                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 2                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 3                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 4                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 5                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 6                        PASSED
  DYNAMIC RAM ZERO TEST - LOCAL SET 7                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 0                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 1                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 2                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 3                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 4                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 5                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 6                        PASSED
  DYNAMIC RAM ONES TEST - LOCAL SET 7                        PASSED

  TILE  4 -  TILE_MEM32_DATA_STORE                           PASSED


While "FAILURE" is printed five times on the console, there is actually only two failing experiments:

  TESTING TILE  3 -  TILE_MEM32_TAGSTORE

  TAGSTORE SHORTS/STUCK-ATS TEST                             PASSED
  TAGSTORE ADDRESS PATTERN TEST                              PASSED
  TAGSTORE PARITY TEST1                                      PASSED
  TAGSTORE PARITY TEST2                                                FAILED

            FAILING EXPERIMENT IS :  TEST_TAGSTORE_PARITY_2

  TAGSTORE RAMS ZERO TEST                                    PASSED
  TAGSTORE RAMS ONES TEST                                    PASSED
  LRU UPDATE TEST                                                      FAILED
            FAILING EXPERIMENT IS :  TEST_LRU_UPDATE
  
  TILE  3 -  TILE_MEM32_TAGSTORE                                       FAILED

Despite some effort, we have still not figured out what the problem is. We suspect a timing issue near or with the tag-RAM.

2022-04-16 A long overdue update

As can be seen in the table above, the simulated SEQ board is down to 12 FAILURE messages, and what the table does not show is that the MEM32 board simulation completes now, but takes more than 24 hour to do, and which makes the daily CI cron(8) job fail catastrophically.

The bug which have taken us almost a month to fix turned out to be the i8052 emulator's CLC C, Complement Carry, instruction not complementing, in a DIPROC bytecode-instruction we had not previously encountered: Calculate Even/Odd parity for a multi-byte word.

Along the way we have attended to much other stuff, tracing, python code for decoding scan-chains, "mega components" etc. and, notably, python generated component SystemC models.

Initially all 12 thousand electrical networks in the simulated part of the system were a sc_signal_resolved instance.

Sc_signal_resolved is the most general signal type in SystemC, having four possible levels, '0', '1', 'Z' and 'X' and allowing multiple 'writers', but it is therefore also the slowest.

Migrating to faster types, bool for single wire binary networks and uint%d_t for single-driver binary busses, requires component models for all the combinations we may encounter, and writing those by hand got old really fast.

For true Tri-state signals, we will still need to use the sc_signal_resolved type, but a lot of Tri-state output chips are used as binary drivers, by tying their OE pin to ground, so relying on the type of a component to tell us what type its output has misses a lot of optimization opportunities.

And thus we now have Python "models" of components, which automatically produce adapted SystemC component models.

Here is an example of the 2149 SRAM model:

class SRAM2149(PartFactory):

    ''' 2149 CMOS Static RAM 1024 x 4 bit '''

    def state(self, file):
        file.fmt('''
                |       uint8_t ram[1024];
                |       bool writing;
                |''')

    def sensitive(self):
        for node in self.comp:
            if node.pin.name[0] != 'D' and not node.net.is_const():
                yield "PIN_" + node.pin.name

    def doit(self, file):
        ''' The meat of the doit() function '''

        super().doit(file)

        file.fmt('''
                |       unsigned adr = 0;
                |
                |       BUS_A_READ(adr);
                |       if (state->writing)
                |               BUS_DQ_READ(state->ram[adr]);
                |
                |''')

        if not self.comp.nodes["CS"].net.is_pd():
            file.fmt('''
                |       if (PIN_CS=>) {
                |               TRACE(<< "z");
                |               BUS_DQ_Z();
                |               next_trigger(PIN_CS.negedge_event());
                |               state->writing = false;
                |               return;
                |       }
                |''')

        file.fmt('''
                |
                |
                |       if (!PIN_WE=>) {
                |               BUS_DQ_Z();
                |               state->writing = true;
                |       } else {
                |               state->writing = false;
                |               BUS_DQ_WRITE(state->ram[adr]);
                |       }
                |       TRACE(
                |           << " cs " << PIN_CS?
                |           << " we " << PIN_WE?
                |           << " a " << BUS_A_TRACE()
                |           << " dq " << BUS_DQ_TRACE()
                |           << " | "
                |           << std::hex << adr
                |           << " "
                |           << std::hex << (unsigned)(state->ram[adr])
                |       );
                |''')

Notice how the code to put the output in high-impedance "3-state" mode is only produced if the chip's CS pin which is not pulled down.

Note also that the code handles the address bus and data bus as a unit, by calling C++ macros generated by common python code. This allows the same component model to be used for wider "megacomp" variants of the components.

This is particularly important for the MEM32 board, which has 64(Type)+64(Value)+9(Ecc) DRAM chips in each of the two memory banks. The simulation runs much faster with just two "1MX64" and one "1MX9" components, than it does with 137 "1MX1" components in each bank.

This optimization is what disabused us of the notion that the CHECK_MEMORY_ONES.M32 experiment hung, it did not, it just took several hours to run - and it is run once for each of the eight "sets" of memory.

With the current 11 failures, the entire MEM32 test takes 140 seconds of simulated time, 7½ hours in our fastest "megacomp2" version of the schematics on our fastest machine.

However our "CI" machine is somewhat slower, and runs the un-optimized "main" version of the schematics, which means the next daily "CI" run is started before the previous one completed, and with them using the same filenames, they both crash.

So despite the world distracting us with actual work, travel, talks, social events, and notably the first ever opening of Datamuseum.dk for the public, we are still making good progress.

2022-03-06 Do not optimize until it works, unless …

It is very old wisdom in computing that it does not matter how fast you can make a program which does not work, and usually we stick firmly to that wisdom.

However, there are exceptions, and the R1000-emulator is one of them.

When the computer was designed, the abstract architecture had to be implemented with the available chips in the 74Sxx and later 74Fxx families of TTL chips, and there being no 64 bit buffers in those families, a buffer for one of the busses was decomposed into 8 parallel 8 bit busses, each running through a 74F245 chip, etc.

In hardware the 8 chips operate in parallel, in software, at least with SystemC, they are sequential, so there is a performance impact.

What is worse, there is a debugging impact as well, because instead of the trace-file telling what the state of the 64 bits are, in a single line, it contains eight lines of 8 bits, in random order.

Therefore we operate with three branches in the R1000.HwDoc github repository: "main", "optimized" and "megacomp".

"Main" is the schematics as they are on paper. That is the branch reported in the table above.

"Optimized" is primarily deduplication of multi-buffered signals, that is signals where multiple outputs in parallel are required to drive all the inputs of that signal, a canonical example being the address lines of the DRAM chips on the MEM32 board.

Finally in "megacomp" we invent our own chips, like a 64 bit version of the 74F245, whereby we both improve the clarity of the schematics, and make the simulation run faster, almost twice as fast as "main" at this point.

Here is the same table as above, for the "megacomp" branch, and run on the fastest CPU currently available to this project:

Test Wall Clock SystemC Ratio Exp run Exp fail
expmon_reset_all 51.787 0.026151 1/1980.3 0 0
expmon_test_fiu 1275.507 17.799928 1/71.7 95 0
expmon_test_ioc 1018.086 11.231571 1/90.6 29 0
expmon_test_mem32 5331.993 30.000000 1/177.7 28 9
expmon_test_seq 1183.407 13.081077 1/90.5 108 32
expmon_test_typ 3629.642 7.468383 1/486.0 73 2
expmon_test_val 3625.022 7.434761 1/487.6 66 0
novram 69.302 0.035584 1/1947.6 0 0

Note that the megacomponents has caused one of the TYP tests to fail, so the old wisdom does apply after all. (The table shows two failures because both the individual test and the entire test-script reports "FAILED" on the console.)

2022-03-05 We will not need to emulate the ENP-100

The R1000/s400 has two ethernet interfaces, one is on the internal IO/bus and can be directly accessed by the IOC and, presumably, the R1000 CPU, the other is on a VME daughter-board, mounted on the RESHA board.

Strangely enough, the TCP/IP protocol only seems to be supported on the latter, whereas the "direct" ethernet port is for use only in cluster configurations.

The VME board is a "ENP-100" from Communication Machinery Corp. of 125 Cremona Drive, Santa Barbara, CA 93117".

R1000 enp100.jpg

The board contains a full 12.5 MHz 68k20 computer, including boot-code EPROMs, 512K RAM, two serial ports and a Ethernet interface.

The firmware for this board is downloaded from the R1000 CPU, and implements a TCP/IP stack, including TELNET and FTP services.

Interestingly, the TCP/IP implementations ignores all packets with IP or TCP options, so no contemporary computers can talk with it, until "modern" options are disabled.

We have no hardware documentation for the ENP-100 board, but we expect emulation is feasible, given enough time and effort.

Fortunately it seems the R1000 can boot without the ENP-100 board, it complains a bit, but it boots.

That takes emulation of the ENP-100 out of the critical path, and makes it optional to even do it.

2022-03-01 The fish will appreciate this

Below the R1000 two genuine and quite beefy Papst fans blow cooling air up through the card-cage.

For a machine which most likely will end up in a raised-floor computing room, it can be done no other way.

However, if the machine is housed anywhere else, an air-filter is required to not suck all sorts of crap into the electronics.

And of course, air-filters should be maintained, so we pulled out the fan-tray and found that the filter mat was rapidly deteriorating, literally falling apart.

Not being air-cooling specialists, we initially ordered normal fan-filters, the kind that looks like loose felt made of plastic, but the exhaust temperature on the top of the machine climbed to over 54°C.

So what was the original filter material, and where could we buy it?

It looks a lot like the material used on the front of the obscure but deservedly renowned concrete Rauna Njord speakers, designed by Bo Hansson in the early 1980'ies, and that material also fell to pieces after about a decade.

Surfing fora where vintage hifi-nerds have restored Rauna Njord we found "PPI 10 Polyureathane foam" mentioned, and that transpires to be a what water-filters for aquariums are made from.

A trip to the local aquarium shop got us a 50x50x3cm sheet of filter material, and the promise that the fish will really appreciate us buying it.

We cut a 12.5cm wide stripe and parted it lengthwise in two slices of roughly equal thickness, using two wooden strips as guides and a sharp bread-knife.

It is almost too bad that one cannot see the sporty blue color when it is mounted below the R1000:

Luftfilter1.png

The two small pieces in the middle is the largest fragment of the old air filter and an off-cut from the 17mm thick slice:

Luftfilter2.png

In the spirit of scientific inquiry, we measured the temperature with both thicknesses.

With the 17mm thick filter, the exhaust to rose to above 52°C.

With the 13mm thick filter, it stabilized around 41°C.

That is a pretty convincing demonstration of the conventional wisdom, that axial fans should push air, not pull it.

So why are the filters on the "pull" side of the fans in the R1000, when the fan-tray is plenty deep for filters to be mounted on the "push" side?

Maybe this is an after-market modification, trying to convert an unfiltered "data-center fan-tray" into a filtered "office fan-tray" ?

2022-02-20 TEST_VAL.EM passes

We're making progress.

Now we are going focus on TYP, where most of the failing tests have something to do with parity checking.

2022-02-12 TEST_FIU.EM passes

As can be seen on the status above, there are no longer any failures on the FIU board when running TEST_FIU.EM.

The speed in the table above is when simulating the unadultered schematics.

Concurrent with fixing bugs we are working on two levels of optimized schematics, one where buffered signals are deduplicated, and one where we use "megacomponents", for instance 64 bit wide versions of 74F240 etc.

The "megacomp" version of FIU runs twice as fast, 1.4% of hardware speed.

2022-01-10 SystemC performance is weird

As often aluded to, the performance of a SystemC simulation is … not ideal … from a usability point of view, so we spend a lot of time thinking about it and measuring it, and it is not at all intuitive for software people like us.

Take the following (redrawn) sheet from the IOC schematic (click for full size)

IOC RESPONSE FIFO

This is the 2048x16 bit FIFO buffer through which the IOC sends replies the the R1000 CPU.

None of the tests in the "TEST_IOC.EM" file gets anywhere near this FIFO, yet the simulation runs 15% faster if this sheet is commented out, because this sheet uses a lot of free-running clocks:

   1 * 2X~     @ 20 MHz        20 MHz
   2 * H2.PHD  @ 10 MHz        20 MHz
   2 * H1E     @ 10 MHz        20 MHz
   1 * H2E     @ 10 MHz        10 MHz
   1 * Q1~      @ 5 MHz         5 MHz
   2 * Q2~      @ 5 MHz        10 MHz
   1 * Q3~      @ 5 MHz         5 MHz
   ----------------------------------
   Simulation load             90 MHz

Where the clocks feed into edge sensitive chip, for instance "Q1~" to "FOREG0" (left center), only one of the flanks need to be simulated, but when state sensitive gates like "2X~" into "FFNAN0A" (near the top), the "FOO" class instance is called for both flanks, effectively doubling the frequency of the 10MHz clock signal.

To make matters even worse, there is an identical FIFO feeding requests the opposite way, from R1000 to IOC, on the next sheet.

And to really drive the point home, all the simulation runs will have to include the IOC board.

In SystemC a FIFO is one of the primitive objects, which can simulate these two pages much faster than this, but to do that we need enough of the machine simulated well enough, to run the experiments which tests the FIFOs.

Until then, we can save oceans of time by simply commenting these two FIFOs out.

2022-01-08 Making headway with FIU

We are making headway with the simulated FIU board, currently 19 tests fail, the 16 "Execute from WCS" and three parity-related tests. We hope the 16 WCS tests have a common cause.

On the FIU we have found the first test-case which depends on undocumented behaviour: TEST_ABUS_PARITY.FIU fails if OFFREG does not have even parity when the test starts.

Simulating the IOC and FIU boards, the simulation currently clocks around 1/380 of hardware speed, if the TYP, VAL and SEQ boards are also simulated, speed drops to 1/3000 of hardware speed. Not bad, not certainly not good.

We have started playing with "mega-symbols" for instance 64bit versions of the 74F240, 74F244 and 74F374. There is a speed advantage, but the major advantage right now is that debugging operates on the entire bus-width at the same time.

2021…2012

  • 2014-2018 - The project got stuck for lack of a sufficiently beefy 5V power-supply, and then phk disappeared while he built a new house.

Many thanks to

  • Erlo Haugen
  • Grady Booch
  • Grek Bek
  • Pierre-Alain Muller
  • Michael Druke
  • Pascal Leroy