Rational/R1000s400/Logbook/2019

Fra DDHFwiki
Spring til navigation Spring til søgning

2019-12-11

SCSI command 0x0d seems to be explained now:

Søren Roug has spotted SASI command 0x0d in the manual for the Xebec S1410 diskcontroller:

It returns the length of burst errors corrected with ECC.

That matches the CMD_CORRECTION Tollef found.

Thanks everybody!

2019-12-08

About that 0x0d SCSI command...

My good friend Tollef Fog Heen replied on twitter that this source file on github calls the command CMD_CORRECTION.

That github repository contains a NeXT Station emulation.

The NeXT Station had a Magneto-Optical drive.

The first two pages of this bunch of papers that came with the machine, marked "aflasting" ("off-loading"), are jumper settings for Fujitsu's M2512A MO drive.

So the best theory now is that 0x0D is a way to probe for a MO drive configured as a disk drive.

I'll leave it at that for now, but I'm still interested to hear from anybody who spots 0x0d in old documents.

/phk

2019-12-07

Now it works!

My hacked up SCSI2SD firmware provided a log of all SCSI commands during boot and the two crucial ones turns out to be:

   1a 00 03 00 24 00
   1a 00 04 00 20 00

Those are "MODE SENSE(6)" commands, asking for page 3 and 4 respectively.

Page 3 is "Format Device" and the FUJITSU M2266 returns:

   00 0f 00 00 00 00 00 2d 00 2d 04 00 00 01 00 05 00 0b 40 00 00 00
   
   Tracks per Zone:  15
   Alternate Sectors per Zone:  0
   Alternate Tracks per Zone:  0
   Alternate Tracks per Logical Unit:  45
   Sectors per Track:  45
   Data Bytes per Physical Sector:  1024
   Interleave:  1
   Track Skew Factor:  5
   Cylinder Skew Factor:  11
   SSEC:  0
   HSEC:  1
   RMB:  0
   SURF:  0

Page 4 is "Rigid Disk Geometry" where the FUJITSU M2266 returns:

   00 06 7a 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00
   
   Number of Cylinders:  1658
   Number of Heads:  15
   Starting Cylinder-Write Precompensation:  0
   Starting Cylinder-Reduced Write Current:  0
   Drive Step Rate:  0
   Landing Zone Cylinder:  0
   RPL:  0
   Rotational Offset:  0
   Medium Rotation Rate:  0

After I hacked the SCSI2SD firmware to return those values, the R1000 booted flawlessly on the saved disk-images from PAM's machine.

Conclusion: The IOC bootstrap uses "modern SCSI" with linear block-numbers, the real system works in terms of Cylinders, Heads and Sectors.

/phk

2019-12-05

A little bit more SCSI2SD progress today.

I compiled a custom firmware for the SCSI2SD to get more logging, still sampling, but it should capture all failing commands now.

The picture from the log is the following:

First some SCSI READ(10) commands, which are compatible with the loading of KERNEL, FS and PROGRAM.

Then a SCSI command 0x0d, which is unknown everywhere I have looked. This command sets a flag in the KERNEL if successful, I have not investigated that flag further.

Next a SCSI MODE SENSE (RIGID DISK GEOMETRY)

Finally SCSI READ(6) commands which are past the DFS filesystem.

At the same time the console shows this:

   Disk  0 is ONLINE and WRITE ENABLED
   Disk  1 is ONLINE and WRITE ENABLED
   IOP Kernel is initialized
   Initializing diagnostic file system ... File does not exist ERROR_LOG
   [OK]

Working theory right now is that the R1000 uses the RIGID DISK GEOMETRY data to calculate disk access, and what the SCSI2SD returns does not work for this.

I queried one of the blank Fujitsu disks we got from Terma in our "dumper machine" and it returns:

   00 06 7a 0f 00 00 00 00
   00 00 00 00 00 00 00 00
   00 00 00 00 00 00
   
   Number of Cylinders:  1658
   Number of Heads:  15
   Starting Cylinder-Write Precompensation:  0
   Starting Cylinder-Reduced Write Current:  0
   Drive Step Rate:  0
   Landing Zone Cylinder:  0
   RPL:  0
   Rotational Offset:  0
   Medium Rotation Rate:  0

Whereas the SCSI2SD seems to return:

   00 38 30 06 00 00 00 00
   00 00 00 00 00 00 00 00
   00 00 1c 05 00 00
   
   Number of Cylinders:  14384
   Number of Heads:  6
   Starting Cylinder-Write Precompensation:  0
   Starting Cylinder-Reduced Write Current:  0
   Drive Step Rate:  0
   Landing Zone Cylinder:  0
   RPL:  0
   Rotational Offset:  0
   Medium Rotation Rate:  7173

My next experiment will be to make the SCSI2SD firmware emit the same RIGID GEOMETRY as the Fujitsu disk.

2019-11-20

I made a DFS backup from PAM's Fujitsu disks, and read the remaining 8mm tapes.

/phk

2019-11-14

Got a bit further with the SCSI2SD: 1024 byte sector size helps, still get a DFS kernel panic though:

   Initializing M400S I/O Processor Kernel 4_2_18
   Disk  0 is ONLINE and WRITE ENABLED
   Disk  1 is ONLINE and WRITE ENABLED
   IOP Kernel is initialized
   Initializing diagnostic file system ... File does not exist ERROR_LOG
   [OK]

   I/O Processor Kernel Crash: error 0614 (hex) at PC=0000849E
   ************************************************

Next, tried to restore the backup of PAM's disks onto the Seagate disks, looks like success:

   --- Booting the R1000 Environment ---
     Loading from file M207_44.M200_UCODE  bound on November 5, 1992 at 6:08:17 PM
     Loading Register Files .... [OK]
   Loading : KAB.11.0.1.MLOAD
   Loading : KMI.11.0.0.MLOAD
   Loading : KKDIO.11.0.3.MLOAD
   Loading : KKD.11.0.0S.MLOAD
   Loading : KK.11.5.9K.MLOAD
   Loading : EEDB.11.2.0D.MLOAD
   Loading : UOSU.11.3.0D.MLOAD
   Loading : UED.10.0.0R.MLOAD
   Loading : UM.11.1.5D.MLOAD
   Loading : UAT.11.2.2D.MLOAD
   851/1529 wired/total pages loaded.
       The use of this system is subject to the software license terms and
       conditions agreed upon between Rational and the Customer.
   
                            Copyright 1992 by Rational.
   
                             RESTRICTED RIGHTS LEGEND
   
       Use, duplication, or disclosure by the Government is subject to
       restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in
       Technical Data and Computer Software clause at DoD FAR Supplement
       252.227-7013.
   
                   Rational
                   3320 Scott Blvd.
                   Santa Clara, California 95054-3197
   
   Starting R1000 Environment - it now owns this console.
   
   ====>> Kernel.11.5.8 <<====
   Kernel: CHANGE_GHOST_LOGGING
   WANT TRACING: FALSE
   WANT LOGGING: FALSE
   Kernel: START_VIRTUAL_MEMORY
   ALLOW PAGE FAULTS: YES
   
   ====>> ERROR_LOG <<====
   22:55:05 --- TCP_IP_Driver.Worker.Finalized
   
   ====>> CONFIGURATOR <<====
   starting diagnosis of configuration
   recovery is needed
   WANT TO BUILD NEW SYSTEM (YES/NO): yes
   starting creation of new system
   VOLUME NAME FOR unit 0: volume 1
   VOLUME NAME FOR unit 1: volume 2
   creating root volume: 1
   adding volume 2 to virtual memory system
   creation of new system is complete
   starting virtual memory system
   the virtual memory system is up
   
   ====>> Kernel.11.5.8 <<====
   Kernel: START_NETWORK_IO
   Kernel: START_ENVIRONMENT
   TRACE LEVEL: INFORMATIVE
   Kernel:
   
   ====>> Environment Elaborator <<====
   Elaborating subsystem:  ENVIRONMENT_DEBUGGER
   Elaborating subsystem:  ABSTRACT_TYPES
   Elaborating subsystem:  MISCELLANEOUS
   Elaborating subsystem:  OS_UTILITIES
   
   ====>> Recovery <<====
   Recovery Is Needed, Should I fake it? no
   Please tell me Volume Id for the Backup Index Tape:
   
   ====>> SYSTEM % RECOVERY <<====
   
   Please Load Tape
     (Kind of Tape    => CHAINED_ANSI,
      Direction       => READING)
   Is Tape mounted and ready to read labels? yes
   Info on tape mounted on drive 0 is:
     (Kind of Tape    => CHAINED_ANSI,
      Writeable       => FALSE,
      Volume Id       => 028600,
      Volume Set Name => BACKUP, 09-MAR-01 16:01:45 3)
   OK to read volume? [YES]
   
   ====>> Recovery <<====
   Positioning tape to Backup Index
   Processing Backup Index
      Processing Tape File: Vol Info
      Processing Tape File: VP Info
      Processing Tape File: DB Backups
      Processing Tape File: DB Processors
      Processing Tape File: DB Disk Volumes
      Processing Tape File: DB Tape Volumes
   Positioning tape to Backup Data
   Processing Backup Data
      Processing Tape File: Space Info Vol 1
      Processing Tape File: Block Info Vol 1
      Processing Tape File: Block Data Vol 1
      Processing Tape File: Space Info Vol 2
      Processing Tape File: Block Info Vol 2
      Processing Tape File: Block Data Vol 2
   
   ====>> SYSTEM % RECOVERY <<====
   
   Please Dismount Tape on Drive 0
     (Kind of Tape    => CHAINED_ANSI,
      Volume Id       => 028600,
      Volume Set Name => BACKUP, 09-MAR-01 16:01:45 3)
   
   ====>> Recovery <<====
   Tape Processing Complete.
   Restoring Data
   Restoring Spaces
   Taking 1st snapshot
   
   ====>> CONFIGURATOR <<====
   starting snapshot
   snapshot is finished
   
   ====>> Recovery <<====
   Updating Databases
   Taking 2nd snapshot
   
   ====>> CONFIGURATOR <<====
   starting snapshot
   snapshot is finished
   
   ====>> Recovery <<====
   Recovery Is Complete
   Garbage Collection can't run until the machine is rebooted
   Rebooting to enable Garbage Collection
   
   ====>> CONFIGURATOR <<====
   starting virtual memory shutdown
   starting snapshot
   snapshot is finished
   virtual memory shutdown at ( 3, 26-MAR-01 04:22:46)
   system shutdown is complete
   
   ***************************************
   Sequencer has detected a machine check.
   
   ************************************************
   Booting R1000 IOP after R1000 Halt or Machine Check detected
     Boot Reason code = 0C, from PC 0001ADA2
   
   Restarting R1000-400S March 26th, 1901 at 04:23:01
   
   OPERATOR MODE MENU - options are:
       1 => Change BOOT/CRASH/MAINTENANCE options
       2 => Change IOP CONFIGURATION
       3 => Enable manual crash debugging (EXPERTS ONLY)
       4 => Boot IOP, prompting for tape or disk
       5 => Boot SYSTEM
   
   Enter option [Boot SYSTEM] :

The disks were formatted yesterday, so I do not have a total time for the restore, but reading the backup tape took 7 hours.

2019-11-13

Tried if the R1000 would accept a SCSI2S without much luck.

The RESHA EEPROM issues the usual terse style error message saying simply "SCSI Error", giving no usable details.

Formatted the Seagate disks so they are ready for an attempt to restore from the Backup-Tape.

If the SCSI2SD is in the machine along with disks, some DFS operations work, but since no defect list can be read, preparing the disk fails.

The two Fujitsu disks with the working image was never powered up today.

/phk

2019-10-29

R1000-400 with Facit A-4600 terminal showing the Rational Environment. Note the keyboard overlay that extends a couple of inches beyond the keyboard.


Keyboard overlay for the Rational Environment.


The front plate of a running R1000-400. ==

2019-10-28

The world has a running Rational R1000/400 computer again:

Thanks a LOT to Pierre-Alain Muller for driving all the way here to help make this happen!

2019-10-24

On suggestion by Pierre-Alain todays aim was an FRU diagnostics.

First the faulty Fujitsu disk was removed and the Seagate promoted to SCSI ID 0.

Then we got a "File does not exist HARDWARE.M200_CONFIG". M200, hmm... - We did the following:

  OPERATOR MODE MENU - options are:
      1 => Change BOOT/CRASH/MAINTENANCE options
      2 => Change IOP CONFIGURATION
      3 => Enable manual crash debugging (EXPERTS ONLY)
      4 => Boot IOP, prompting for tape or disk
      5 => Boot SYSTEM
  
  Enter option [Boot SYSTEM] : 1
  Enable Modem DIALOUT [N] ? 
  Enable Modem ANSWER [N] ? 
  Enable IOP (IOC 68K) Auto Boot [N] ? y
  Enable R1000 CPU Auto Boot [N] ? n
  Enable AUTO CRASH RECOVERY [N] ? y
  Enable CONSOLE BREAK KEY [N] ? y
  Are these new defaults [N] ? y

then

  CLI/CRASH MENU - options are:
    1 => enter CLI
    2 => make a CRASHDUMP tape
    3 => display CRASH INFO
    4 => Boot DDC configuration
    5 => Boot EEDB configuration
    6 => Boot STANDARD configuration
  Enter option [Boot EEDB configuration] : 1
  CLI> x cedit
  Change hardware configuration [N] ? y
  File does not exist HARDWARE.M200_CONFIG
  Does this processor have 32 MB memory boards [Y] ? 
  NOTE: 32 MB boards must be installed as MEM 0 or MEM 2 only.
        8 MB boards cannot be in the same CPU as 32 MB boards.
  Does memory board 0 exist [Y] ? y
  Does memory board 2 exist [Y] ? n
  CLI> x expmon
  0  32MB MEMORY BOARDS IN PROCESSOR - TOTAL OF 0 MEGABYTES. 
  EM> bye
  CLI> quit

At this point we rebooted, but the memory board was still not detected. We then decided to take all 8 boards out of the machine starting with the two MEM 0/2 boards, photographing and then re-seating them. The 2 memory boards looked identical, so we switched their position. At the following boot we got: "Memory 2 exists but is not in configuration. Board will not be used.", so we changed the HW config again and included both MEM boards. We don't believe the changed positions did anything, but perhaps the power-cycle was needed to get them detected right?!

At this point we attempted the FRU-diagnostics:

  CLI> x expmon
  2  32MB MEMORY BOARDS IN PROCESSOR - TOTAL OF 64 MEGABYTES. 
  EM> rd
  ...
  EM> poll_all
  NO MACHINE CHECKS DETECTED 
  EM> poll_all
  SEQ HAS DETECTED A MACHINE CHECK 
  EM> sm
  ...
  UCODE HALT AT 0102 
  ...
  EM> bye
  CLI> 
  CLI> x rdiag
  ...
  DIAG> test/3 all
  Running FRU P1DCOMM
  Running FRU P1IOC
  Running FRU P1VAL
  Running FRU P1TYP
  Running FRU P1SEQ
  Running FRU P1FIU
  Running FRU P1MEM
  Running FRU P1SF
  Running FRU P2IOC
  Running FRU P2VAL
    Loading from file PHASE2_MULT_TEST.M200_UCODE  bound on July 16, 1986  14:31:44
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  Running FRU P2TYP
  Running FRU P2SEQ
  Running FRU P2FIU
  Running FRU P2MEM
  Running FRU P2UADR
  Running FRU P2FP
    Loading from file FPTEST.M200_UCODE  bound on January 29, 1990 17:26:52
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  Running FRU P2EVNT
  Running FRU P2STOP
  Running FRU P2ABUS
    Loading from file ABUS_TEST.M200_UCODE  bound on July 22, 1986  13:27:21
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store  [OK]
  Running FRU P2CSA
  Running FRU P2MM
  Running FRU P2COND
  Running FRU P2UCODE
    Loading from file P2UCODE.M200_UCODE  bound on August 6, 1986  16:16:27
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store .......... [OK]
  Running FRU P3RAMS
  Running FRU P3UCODE
    Loading from file P3UCODE.M200_UCODE  bound on August 6, 1986  13:50:42
    Loading Register Files and Dispatch Rams .... [OK]
    Loading Control Store . [OK]
  PASSED

"PASSED"!!!

And a bit later the following line:

  Starting R1000 Environment - it now owns this console.

Cleaned up log from the most interesting part of the session: Fil:20191024 1915 R1000.pdf

2019-10-17

Peter has returned with a working PSU :-) - The R1000 is now up and running again drawing 155 amps.

Almost full log of the session: Fil:20191017 1954 R1000.pdf

The Fujitsu 2266 disk appears to be faulty, R1000 complains about disk errors (our old acquaintance General Error?):

  Options are:                                                                          
      0 => Exit                                                                         
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 3
  Enter unit number of disk to format/build/scan (usually 0) : 0
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005
  ** ABORT: Can't retrieve labels due to disk errors.

A Seagate ST41200N is now installed as DISK 1, the Fujitsu remains as DISK 0 for now. R1000 recognizes the Seagate but wants to format it:

  Initializing M400S I/O Processor Kernel 4_2_16
  Spinning up disk 1
  Spinning up disk 0
  Disk  1 is ONLINE and WRITE ENABLED
  IOP Kernel is initialized
  Enable line printer for console output [N] ? 
      RECOVERY 14.04 92/09/17 10:00:00\
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 3
  Enter unit number of disk to format/build/scan (usually 0) : 1
  ** ABORT: Disk has no labels.
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 1
  Enter unit number of disk to format/build/scan (usually 0) : 1
  Disk has no labels.
  Drive types are:
      1 - Fujitsu 2263
      2 - Fujitsu 2266
      3 - SEGATE ST41200N
      0 - Other
  Enter drive type : 3
  Enter HDA serial number : TJ617458
  Disk must be formated.
  Formatting the drive will take about 35 minutes.
  Elapsed time is 00:32:32
  Writing bad block information.
  Writing boot label.
  Writing DFS label.
  Do you want to build a diagnostic file system on this unit [Y] ? 
  Enter last cylinder to be used by the DFS [ Hint => 76 ]:76
  Enter first cylinder to be used for read/write diagnostics [ Hint => 1889 ]:1889
  Writing shared label.
  Constructing free list.
  Writing free list.
  Allocating and initializing directory.
  Creating predefined files.
  KERNEL_0.M200
  KERNEL_1.M200
  KERNEL_2.M200
  FS_0.M200
  FS_1.M200
  FS_2.M200
  PROGRAM_0.M200
  PROGRAM_1.M200
  PROGRAM_2.M200
  DFS_BOOTSTRAP.M200
  ERROR_LOG
  Do you want to load files into the DFS on this unit [Y] ? y
  Tape drive unit number : 0
  Do you want to display filenames as they are loaded [Y] ? y
  Reading -> DFS_BOOTSTRAP.M200
  Reading -> KERNEL_0.M200
  ... 3160 files later ...
  Reading -> DDC.M200_CONFIG
  Elapsed time is 00:10:40
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 0
  Boot disk has been rebuilt or the IOP was booted from tape.
  You must crash the machine to exit.

Next week, boot from disk and see how far we get. PSU got *hot*, but survived the 1½ hour session.

2019-10-03

After rumaging through our entire workshop, it transpires that we have no solder-iron with sufficient power to unsolder the capacitors from the thick copper on the PCB.

One of our members, Peter, has offered to attempt the repair in his own workshop, and he picked it up tonight.

2019-09-12

PSU dismantled, and the visible defective Electrolyte has been soldered out together with two Tantalum. Unfortunately, when mounted, the leads were pinched and cut, making them difficult to pull through the PCB today since the holes are almost exactly the size of the leads. The insulation on the wires to the transformer has deteriorated quite a bit and will need some repair.

Some better images of the PSU as a whole:

Overview


One half of the PSU


Other half of the PSU


Defect insulation on some wires


One of the two capacitor boards on the 5V rails.


Backside of PCB carrying the damaged capacitor.



Each of the two PCBs carry: 5 Tantalum capacitors 15µF, and 5 6800µF SXF 30mm x 18mm, lead spacing 7.5mm

2019-09-05

Arriving today, expecting a fight, armed with various debugging plans, the Rational just started, booted and were happy as could be?!? - After some configuration, and booting the kernel "M400S_KERNEL_0.M200" (thanks to Pierre-Alain for supplying that information), the R1000 now responds with:

   R1000-400 IOC SELFTEST 1.3.2 
      512 KB memory ... [OK]
      Memory parity ... [OK]
      I/O bus control ... [OK]
      I/O bus map ... [OK]
      I/O bus map parity ... [OK]
      I/O bus transactions ... [OK]
      PIT ... [OK]
      Modem DUART channel ... [OK]
      Diagnostic DUART channel ... [OK]
      Clock / Calendar ... [OK]
  Checking for RESHA board
      RESHA EEProm Interface ... [OK]
  Downloading RESHA EEProm 0 - TEST
  Downloading RESHA EEProm 1 - LANCE 
  Downloading RESHA EEProm 2 - DISK  
  Downloading RESHA EEProm 3 - TAPE  
      DIAGNOSTIC MODEM ... DISABLED
      RESHA VME sub-tests ... [OK]
      LANCE chip Selftest ... [OK]
      RESHA DISK SCSI sub-tests ... [OK]
      RESHA TAPE SCSI sub-tests ... [OK]
      Local interrupts ... [OK]
      Illegal reference protection ... [OK]
      I/O bus parity ... [OK]
      I/O bus spurious interrupts ... [OK]
      Temperature sensors ... [OK]
      IOC diagnostic processor ... [OK]
      Power margining ... [OK]
      Clock margining ... [OK]
  Selftest passed
  
  Restarting R1000-400S January 14th, 1901 at 22:56:43
  
  OPERATOR MODE MENU - options are:
      1 => Change BOOT/CRASH/MAINTENANCE options
      2 => Change IOP CONFIGURATION
      3 => Enable manual crash debugging (EXPERTS ONLY)
      4 => Boot IOP, prompting for tape or disk
      5 => Boot SYSTEM
  
  Enter option [Boot SYSTEM] : 5
  
  Logical tape drive 0 is an 8mm cartridge tape drive.
  Logical tape drive 1 is declared non-existent.
  Logical tape drive 2 is declared non-existent.
  Logical tape drive 3 is declared non-existent.
  Booting I/O Processor with Bootstrap version 0.4
  
  Boot from (Tn or Dn)  [D0] : T0
  
  Tape_Boot_1.2.0  920401
  Waiting for tape unit ready.
  Strike any key to abort.....................
  End of Tape Reached.rewinding
  
  Select files to boot [D=DEFAULT, O=OPERATOR_SUPPLIED] : [D]
  Skipping..
  Loading FS_0.M200
  
  Loading RECOVERY.M200
  Skipping.................
  Loading M400S_KERNEL_0.M200
  
  Initializing M400S I/O Processor Kernel 4_2_16
  Spinning up disk 0
  IOP Kernel is initialized
  Enable line printer for console output [N] ? 
      RECOVERY 14.04 92/09/17 10:00:00\
  Options are:
      0 => Exit
      1 => Initialize disk (for experts only)
      2 => Initialize disk, drop USR defects (internal use only)
      3 => Show MFG and USR bad block locations
      4 => Show only USR bad block locations
      5 => Install new DFS only
      6 => Show bad block count and DOS limits
  Enter option : 
  *** AC power is L

The last line is probably from the time where I cut the power after seeing significant white-gray smoke coming up from the machine...

The following smell-test suggested that the PSU should be checked:

The capacitors are located between the 5V power rails (the two black blocks on each side of the cap).

Next job: Acquire and exchange 10 x 6800µF 6.3V capacitors (L<31, D<=18)

2019-08-29

Disappointment! - We had hoped to get further in the boot process, but met an "unwilling" machine that didn't even presented itself. The PSU powered up with its fan, but that was it - No lights, no 5V, -12V or +12V. During power-off, these lights turned for a very short period, indicating the PSU is capable but unwilling (Inhibit line active?). We are not completely sure of the reason, but we will debug the issue next week, starting with checking the RESHA diagrams, followed up by checking the IOC RTC-battery (which were replaced last week).

2019-08-22

After further tests we finally took the leap and grabbed the cutter and solder iron, and replaced the three suspected memory chips. Boot sequence with the original BIOS now gives:

   R1000-400 IOC SELFTEST 1.3.2 
      512 KB memory ... [OK]
      Memory parity ... [OK]
      I/O bus control ... [OK]
      I/O bus map ... [OK]
      I/O bus map parity ... [OK]
      I/O bus transactions ... [OK]
      PIT ... [OK]
      Modem DUART channel ... [OK]
      Diagnostic DUART channel ... [OK]
      Clock / Calendar ... [OK]
  Checking for RESHA board
    --  Bench mode (ID 7) detected Skipping RESHA tests
      Local interrupts ... [OK]
      Illegal reference protection ... [OK]
      I/O bus parity ... [OK]
      I/O bus spurious interrupts ... [OK]
      Temperature sensors ... [OK]
      IOC diagnostic processor ... [OK]
      Power margining ... [OK]
      Clock margining ... [OK]
  Selftest passed
  
  Restarting R1000-400S January 1st, 1901 at 00:03:56
  
  Logical tape drive 0 is an 8mm cartridge tape drive.
  Logical tape drive 1 is declared non-existent.
  Logical tape drive 2 is declared non-existent.
  Logical tape drive 3 is declared non-existent.
  Booting I/O Processor with Bootstrap version 0.4
  
  Boot from (Tn or Dn)  [D0] : 

Success!

Next step: Mount the board into its rightful place and see how far it gets now...

2019-06-06

Decided to make further measurements with the oscilloscope in order to rule out other causes. Probing all pins on a good and a bad RAM chip did not reveal anything.

Tried to piggyback H11 with a new RAM-chip, and the questionable bit went mid between VCC and GND. It could be that the good chip tried to pull down while the bad chip pulled up. Double-checked chip-select on H3, the only other identified chip that may drive the same data line (D30), and chip-select is completely passive during the troublesome period.

Previous tests went between [0x00000000..0x00040000[ and [0x00040000..0x00080000[. Now tried to run the tests between [0x00001000..0x00021000[ and [0x00041000..0x00061000[ as well as between [0x00001200..0x00021200[ and [0x00041200..0x00061200[ to test whether run-length could be a factor. The exact same failing addresses indicate run-length is not a factor.

Tried a reverse scan, Set(addr), Clr(addr), Set(addr) then Get(addr-4). The same data bits are affected in both banks, but the failing addresses are not identical. Some patterns are similar, and some other patterns appears.

Tried another test with: Set(addr), Clr(addr), Set(addr), Get(addr+4), branch-test delay, then another Get(addr+4) - The second Get reads out correctly. this indicates that the chip does have the right value, but that it is incorrectly read out in some circumstances.

2019-05-23

Previous software tests indicated problems with certain bits at some address patterns: Bits 7 and 23 in the low bank and bit 30 in the high bank showed issues. The fault manifests itself at some addresses when the following memory accesses are done in quick succession: Set(addr), Clr(addr), Set(addr) then Get(addr+4) - The Get(addr+4) returns incorrect values only on these bits.

Today all RAM chips were checked with oscilloscope to verify and possibly identify the problem. H11, G10 and G41 showed different behavior on the oscilloscope, and these chips happens to map to the exact bits identified at the software test.

H40 did show a little flickering on the DC levels, but the flanks seemed OK.

The above input to the RAM-chips looks like this:

The output of a healthy chip looks like this:

The output of the sick chips looks like this:


Next step will be to replace them. We have replacements ready, so stay tuned...

2019-03-07

Tried to patch the EEPROM in various ways, and learned a lot more.

If we skip the offending memory check, (and the EEPROM checksum because we're lazy) we get all the way to the boot device prompt (tape/disk).

We got two-way serial connection to the console port: TX is TTL level, RX is RS-232 level.

In trivial homebrew tests, the RAM does not fail, but what we call "ramtest_5" repeatedly does.

Big discovery of the night: The two top address bits of the EEPROM are swapped on the PCB, so the middle two quarters are swapped in the image we try to reverse-engineer. After fixing that, the contents make a lot more sense.

2019-02-28

Managed to power up the IOC board stand-alone. 5V @ 35A required (30A didn't seem to be quite enough).

Procedure:

  1. 5V @ 35A connected to 3 Capacitors at edge (to distribute load).
  2. Reset (GB113) to Ground (page 23 of R1000_SCHEM_IOC.pdf).
  3. CTS# (GB055) to 5V (page 25)
  4. Power On
  5. Release Reset (GB113)

TTL Serial output read from CPDRV0 @N1 (pin 2 or 3).

As expected output is still:

  R1000-400 IOC SELFTEST 1.3.2 
     512 KB memory ... * * * * * * * FAILED

EEPROM 28256 is not compatible with EPROM 27256!