Rational/R1000s400/Logbook

Fra DDHFwiki
Spring til navigationSpring til søgning

TODO

  • Do we need to format the disks to 1024 byte sectors before we mount them in the R1000 ?
    • Grek Bek: Should be able to boot off tape and initialize the disk, which will format it correctly. If it is a disk that came from Rational at some point it is probably already sectored correctly.
  • What SCSI disks can we use, how many disks do we need, and how do we get R1000 to accept them ?
    • Grek Bek: 1 is all you need. The first disk is partitioned into two parts: 1) DFS Diagnostic File System about 300MB 2) Environment - the rest

Other notes

  • Grek Bek: Due to Y2K issues and software licensing issues, you may need to set the clock back before 2000.
  • Nico de Jong : I can recognize the problems you head reading 8200 tapes on a 8505. I have had the same experience. For some weird reason, some 8200 tapes can only be read on 8505 when then tape to be read is write-protected ! It is some time ago I had to do that, but I seem to remember that it was especially a problem with Fuji tapes

The Intermedia conversion system in the museum has the ability to dump tapes, including the recording of blocksizes etc.

2019-10-17

Peter has returned with a working PSU :-) - The R1000 is now up and running again drawing 155 amps.

Almost full log of the session: Fil:20191017 1954 R1000.pdf

The Fujitsu 2266 disk appears to be faulty, R1000 complains about disk errors (our old acquaintance General Error?):

  Options are:                                                                          
      0 => Exit                                                                         
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 3
  Enter unit number of disk to format/build/scan (usually 0) : 0
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  ** ABORT: Can't retrieve labels due to disk errors.

A Seagate ST41200N is now installed as DISK 1, the Fujitsu remains as DISK 0 for now. R1000 recognizes the Seagate but wants to format it:

  Initializing M400S I/O Processor Kernel 4_2_16
  Spinning up disk 1
  Spinning up disk 0
  Disk  1 is ONLINE and WRITE ENABLED
  IOP Kernel is initialized
  Enable line printer for console output [N] ? 
      RECOVERY 14.04 92/09/17 10:00:00\
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 3
  Enter unit number of disk to format/build/scan (usually 0) : 1
  ** ABORT: Disk has no labels.
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 1
  Enter unit number of disk to format/build/scan (usually 0) : 1
  Disk has no labels.
  Drive types are:
      1 - Fujitsu 2263
      2 - Fujitsu 2266
      3 - SEGATE ST41200N
      0 - Other
  Enter drive type : 3
  Enter HDA serial number : TJ617458
  Disk must be formated.
  Formatting the drive will take about 35 minutes.
  Elapsed time is 00:32:32
  Writing bad block information.
  Writing boot label.
  Writing DFS label.
  Do you want to build a diagnostic file system on this unit [Y] ? 
  Enter last cylinder to be used by the DFS [ Hint => 76 ]:76
  Enter first cylinder to be used for read/write diagnostics [ Hint => 1889 ]:1889
  Writing shared label.
  Constructing free list.
  Writing free list.
  Allocating and initializing directory.
  Creating predefined files.
  KERNEL_0.M200
  KERNEL_1.M200
  KERNEL_2.M200
  FS_0.M200
  FS_1.M200
  FS_2.M200
  PROGRAM_0.M200
  PROGRAM_1.M200
  PROGRAM_2.M200
  DFS_BOOTSTRAP.M200
  ERROR_LOG
  Do you want to load files into the DFS on this unit [Y] ? y
  Tape drive unit number : 0
  Do you want to display filenames as they are loaded [Y] ? y
  Reading -> DFS_BOOTSTRAP.M200
  Reading -> KERNEL_0.M200
  ... 3160 files later ...
  Reading -> DDC.M200_CONFIG
  Elapsed time is 00:10:40
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 0
  Boot disk has been rebuilt or the IOP was booted from tape.
  You must crash the machine to exit.

Next week, boot from disk and see how far we get. PSU got *hot*, but survived the 1½ hour session.

2019-10-03

After rumaging through our entire workshop, it transpires that we have no solder-iron with sufficient power to unsolder the capacitors from the thick copper on the PCB.

One of our members, Peter, has offered to attempt the repair in his own workshop, and he picked it up tonight.

2019-09-12

PSU dismantled, and the visible defective Electrolyte has been soldered out together with two Tantalum. Unfortunately, when mounted, the leads were pinched and cut, making them difficult to pull through the PCB today since the holes are almost exactly the size of the leads. The insulation on the wires to the transformer has deteriorated quite a bit and will need some repair.

Some better images of the PSU as a whole:

R1000 PSU Overview.jpg

Overview


R1000 PSU Side 1.jpg

One half of the PSU


R1000 PSU Side 2.jpg

Other half of the PSU


R1000 PSU degraded isolation.jpg

Defect insulation on some wires


R1000 PSU capacitor side.jpg

One of the two capacitor boards on the 5V rails.


R1000 PSU capacitor PCB.jpg

Backside of PCB carrying the damaged capacitor.



Each of the two PCBs carry: 5 Tantalum capacitors 15µF, and 5 6800µF SXF 30mm x 18mm, lead spacing 7.5mm

2019-09-05

Arriving today, expecting a fight, armed with various debugging plans, the Rational just started, booted and were happy as could be?!? - After some configuration, and booting the kernel "M400S_KERNEL_0.M200" (thanks to Pierre-Alain for supplying that information), the R1000 now responds with:

   R1000-400 IOC SELFTEST 1.3.2 
      512 KB memory ... [OK]
      Memory parity ... [OK]
      I/O bus control ... [OK]
      I/O bus map ... [OK]
      I/O bus map parity ... [OK]
      I/O bus transactions ... [OK]
      PIT ... [OK]
      Modem DUART channel ... [OK]
      Diagnostic DUART channel ... [OK]
      Clock / Calendar ... [OK]
  Checking for RESHA board
      RESHA EEProm Interface ... [OK]
  Downloading RESHA EEProm 0 - TEST
  Downloading RESHA EEProm 1 - LANCE 
  Downloading RESHA EEProm 2 - DISK  
  Downloading RESHA EEProm 3 - TAPE  
      DIAGNOSTIC MODEM ... DISABLED
      RESHA VME sub-tests ... [OK]
      LANCE chip Selftest ... [OK]
      RESHA DISK SCSI sub-tests ... [OK]
      RESHA TAPE SCSI sub-tests ... [OK]
      Local interrupts ... [OK]
      Illegal reference protection ... [OK]
      I/O bus parity ... [OK]
      I/O bus spurious interrupts ... [OK]
      Temperature sensors ... [OK]
      IOC diagnostic processor ... [OK]
      Power margining ... [OK]
      Clock margining ... [OK]
  Selftest passed
  
  Restarting R1000-400S January 14th, 1901 at 22:56:43
  
  OPERATOR MODE MENU - options are:
      1 => Change BOOT/CRASH/MAINTENANCE options
      2 => Change IOP CONFIGURATION
      3 => Enable manual crash debugging (EXPERTS ONLY)
      4 => Boot IOP, prompting for tape or disk
      5 => Boot SYSTEM
  
  Enter option [Boot SYSTEM] : 5
  
  Logical tape drive 0 is an 8mm cartridge tape drive.
  Logical tape drive 1 is declared non-existent.
  Logical tape drive 2 is declared non-existent.
  Logical tape drive 3 is declared non-existent.
  Booting I/O Processor with Bootstrap version 0.4
  
  Boot from (Tn or Dn)  [D0] : T0
  
  Tape_Boot_1.2.0  920401
  Waiting for tape unit ready.
  Strike any key to abort.....................
  End of Tape Reached.rewinding
  
  Select files to boot [D=DEFAULT, O=OPERATOR_SUPPLIED] : [D]
  Skipping..
  Loading FS_0.M200
  
  Loading RECOVERY.M200
  Skipping.................
  Loading M400S_KERNEL_0.M200
  
  Initializing M400S I/O Processor Kernel 4_2_16
  Spinning up disk 0
  IOP Kernel is initialized
  Enable line printer for console output [N] ? 
      RECOVERY 14.04 92/09/17 10:00:00\
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 
  *** AC power is L

The last line is probably from the time where I cut the power after seeing significant white-gray smoke coming up from the machine...

The following smell-test suggested that the PSU should be checked:

R1000 PSU Broken Cap.jpg

The capacitors are located between the 5V power rails (the two black blocks on each side of the cap).

Next job: Acquire and exchange 10 x 6800µF 6.3V capacitors (L<31, D<=18)

2019-08-29

Disappointment! - We had hoped to get further in the boot process, but met an "unwilling" machine that didn't even presented itself. The PSU powered up with its fan, but that was it - No lights, no 5V, -12V or +12V. During power-off, these lights turned for a very short period, indicating the PSU is capable but unwilling (Inhibit line active?). We are not completely sure of the reason, but we will debug the issue next week, starting with checking the RESHA diagrams, followed up by checking the IOC RTC-battery (which were replaced last week).

2019-08-22

After further tests we finally took the leap and grabbed the cutter and solder iron, and replaced the three suspected memory chips. Boot sequence with the original BIOS now gives:

   R1000-400 IOC SELFTEST 1.3.2 
      512 KB memory ... [OK]
      Memory parity ... [OK]
      I/O bus control ... [OK]
      I/O bus map ... [OK]
      I/O bus map parity ... [OK]
      I/O bus transactions ... [OK]
      PIT ... [OK]
      Modem DUART channel ... [OK]
      Diagnostic DUART channel ... [OK]
      Clock / Calendar ... [OK]
  Checking for RESHA board
    --  Bench mode (ID 7) detected Skipping RESHA tests
      Local interrupts ... [OK]
      Illegal reference protection ... [OK]
      I/O bus parity ... [OK]
      I/O bus spurious interrupts ... [OK]
      Temperature sensors ... [OK]
      IOC diagnostic processor ... [OK]
      Power margining ... [OK]
      Clock margining ... [OK]
  Selftest passed
  
  Restarting R1000-400S January 1st, 1901 at 00:03:56
  
  Logical tape drive 0 is an 8mm cartridge tape drive.
  Logical tape drive 1 is declared non-existent.
  Logical tape drive 2 is declared non-existent.
  Logical tape drive 3 is declared non-existent.
  Booting I/O Processor with Bootstrap version 0.4
  
  Boot from (Tn or Dn)  [D0] : 

Success!

Next step: Mount the board into its rightful place and see how far it gets now...

2019-06-06

Decided to make further measurements with the oscilloscope in order to rule out other causes. Probing all pins on a good and a bad RAM chip did not reveal anything.

Tried to piggyback H11 with a new RAM-chip, and the questionable bit went mid between VCC and GND. It could be that the good chip tried to pull down while the bad chip pulled up. Double-checked chip-select on H3, the only other identified chip that may drive the same data line (D30), and chip-select is completely passive during the troublesome period.

Previous tests went between [0x00000000..0x00040000[ and [0x00040000..0x00080000[. Now tried to run the tests between [0x00001000..0x00021000[ and [0x00041000..0x00061000[ as well as between [0x00001200..0x00021200[ and [0x00041200..0x00061200[ to test whether run-length could be a factor. The exact same failing addresses indicate run-length is not a factor.

Tried a reverse scan, Set(addr), Clr(addr), Set(addr) then Get(addr-4). The same data bits are affected in both banks, but the failing addresses are not identical. Some patterns are similar, and some other patterns appears.

Tried another test with: Set(addr), Clr(addr), Set(addr), Get(addr+4), branch-test delay, then another Get(addr+4) - The second Get reads out correctly. this indicates that the chip does have the right value, but that it is incorrectly read out in some circumstances.

2019-05-23

Previous software tests indicated problems with certain bits at some address patterns: Bits 7 and 23 in the low bank and bit 30 in the high bank showed issues. The fault manifests itself at some addresses when the following memory accesses are done in quick succession: Set(addr), Clr(addr), Set(addr) then Get(addr+4) - The Get(addr+4) returns incorrect values only on these bits.

Today all RAM chips were checked with oscilloscope to verify and possibly identify the problem. H11, G10 and G41 showed different behavior on the oscilloscope, and these chips happens to map to the exact bits identified at the software test.

H40 did show a little flickering on the DC levels, but the flanks seemed OK.

The above input to the RAM-chips looks like this:

R1000 RAM Input.jpg

The output of a healthy chip looks like this:

R1000 RAM Output Healthy.jpg

The output of the sick chips looks like this:

R1000 RAM Output Sick.jpg


Next step will be to replace them. We have replacements ready, so stay tuned...

2019-03-07

Tried to patch the EEPROM in various ways, and learned a lot more.

If we skip the offending memory check, (and the EEPROM checksum because we're lazy) we get all the way to the boot device prompt (tape/disk).

We got two-way serial connection to the console port: TX is TTL level, RX is RS-232 level.

In trivial homebrew tests, the RAM does not fail, but what we call "ramtest_5" repeatedly does.

Big discovery of the night: The two top address bits of the EEPROM are swapped on the PCB, so the middle two quarters are swapped in the image we try to reverse-engineer. After fixing that, the contents make a lot more sense.

2019-02-28

Managed to power up the IOC board stand-alone. 5V @ 35A required (30A didn't seem to be quite enough).

Procedure:

  1. 5V @ 35A connected to 3 Capacitors at edge (to distribute load).
  2. Reset (GB113) to Ground (page 23 of R1000_SCHEM_IOC.pdf).
  3. CTS# (GB055) to 5V (page 25)
  4. Power On
  5. Release Reset (GB113)

TTL Serial output read from CPDRV0 @N1 (pin 2 or 3).

As expected output is still:

  R1000-400 IOC SELFTEST 1.3.2 
     512 KB memory ... * * * * * * * FAILED

EEPROM 28256 is not compatible with EPROM 27256!

2013-06-27

More email exchanges with Greg Bek and Michael Druke, both ex-Rational people.

Our IOC board does actually have 72 64K*1 DRAM chips, four banks of 18 chips.

Read the EEPROM from the IOC board, and have started disassembling the self-test code, to find out what it actually does.

Powered up with only IOC and RESHA boards seated, same result, which at least means that we can test in that config.

Also tried powering the IOC board up out of the cage, but our +5V supply can only deliver 7 Ampere, and that's not enough.


2013-06-20

Greg Bek replied to email and told me I was looking at the wrong 68k processor: The daughterboard on the RESHA is just for TCP/IP.

The failing RAM test is the 68k on the IOC board.

Opened the cardcage and where this label greets you:

R1000 Cardcage Label.png

Visual inspection of the IOC revealed nothing of notice.

Changed the Lithium battery which is used for the RTC.

Gently reseated all socketed chips on the IOC.

Gently reseated all the other cards in the card cage.

Still fails RAM test.

The IOC Schematic manual mentions a low-level debugger, but it seems to require RAM test to pass before you can enter it.

ID-information on the IOC:

R1000 IOC ID1.png R1000 IOC ID2.png R1000 IOC ID3.png

2013-06-13

Finally!

Our workshop area has been rearranged, the GIER computer moved and I could finally uncrate the R1000s400.

After a visual inspection, I did "the smoke-test" and can happily report that the smoke stayed inside.

Unfortunately, we don't get very far in the boot sequence:

  R1000-400 IOC SELFTEST 1.3.2 
     512 KB memory ... * * * * * * * FAILED

This is 512KB DRAM memory on a small daughterboard on a CMC ENP100 VME board mounted on the RESHA board:

R1000 RESHA.png

I have tried various obvious remedies, reseating connectors etc, but to no avail.

2013-04-05

A few small details have to fall into place before I can start to power up the machine: Our new workshop area is done now, and the last missing bit is the rearranging the power-circuits.

In the meantime scanning of documentation progresses about a binder per week or so, and should be complete before summer.

2012-09-27

Scanned in the "Guru Course" material while hosting COMAL kickoff meeting. Sheet-feeding scanners are a good thing.

I have started a new wiki page for the documentation: Rational/R1000s400/Documentation

2012-08-30

The 8200 Ole brought was from an IBM RS/6000, and either because of special firmware/settings or fault on that drive, it refused to do anything more advanced than a SCSI Inquiry and Test-Unit-Ready.

In the end I wrote a set of tapes on the known-suspect 8200 drive, and test-read them on the known-good 8505 drive with success.

I have now, finally, convinced myself that I have written good copies of Pierre-Alains tapes, and more importantly, that I can do so again, if the Exabyte drive in the R1000 cannot read them for reasons of alignment etc.

Received email from Grek Bek with answers to a lot of my questions, so now next thing on the program is to find a corner with power and space to set up the machine. This probably involves moving other stuff out of the way, so it may take some weeks.

2012-08-16

Read Pierre-Alains original tapes on the EXB-8505 drive, and got the exact same result as the read on the EXB-8200 drive, so now the originals can go back to Pierre-Alain.

Failed utterly to make 8200 read a tape written on the 8505.

Ole has promised to bring in a known good 8200 drive, that will be next attempt.

2012-08-09

Spent tonight trying to write copies of the tapes I read last thursday.

The Exabyte 8200 drive I have used has clearly shown itself to be faulty, or at the very least flakey.

Tried using a 8505 drive instead, but worry that it may have used compression while writing the tapes.

Plan:

1. Re-read Pierre-Alains tapes with 8505 to check that on-disk copy is a good read. I have no reason to doubt this, based on my analysis, but I want to be 100% sure before I return the tapes to Pierre-Alain.

2. Try to read the tapes I wrote tonight with flakey 8200 drive. If it can read (some of) them, they are not compressed.

2012-08-02

Tonight I set up a FreeBSD computer with an ExaByte 8200 tapedrive, and used it to read the three tapes Pierre-Alain Muller mailed me last month.

The tapes have a rather complex block structure, and I have read them in using a program which also stores the information about block-sizes, tape-marks etc, so that it should be possible to write exact copies of the tapes to install from.

The result are three files:

 -rw-r--r--  1 phk   wheel  41837637 Aug  2 18:19 20120802_R1000_CATALOG
 -rw-r--r--  1 phk   wheel  13141316 Aug  2 18:20 20120802_R1000_DFS
 -rw-r--r--  1 phk   wheel  78362000 Aug  2 18:23 20120802_R1000_ENVIRONMENT
 MD5 (20120802_R1000_CATALOG) = 6102fb4c10fa580af4fb0e508cfd4127
 MD5 (20120802_R1000_DFS) = 5e61879f066484cacdd1e895cea65b41
 MD5 (20120802_R1000_ENVIRONMENT) = dce495e2b575e8aeae3da79e9fce8074

Many thanks to

  • Erlo Haugen
  • Grady Booch
  • Grek Bek
  • Pierre-Alain Muller
  • Pascal Leroy