Rational/R1000s400/Logbook/2019
2019-12-11
SCSI command 0x0d seems to be explained now:
Søren Roug has spotted SASI command 0x0d in the manual for the Xebec S1410 diskcontroller:
It returns the length of burst errors corrected with ECC.
That matches the CMD_CORRECTION Tollef found.
Thanks everybody!
2019-12-08
About that 0x0d SCSI command...
My good friend Tollef Fog Heen replied on twitter that this source file on github calls the command CMD_CORRECTION.
That github repository contains a NeXT Station emulation.
The NeXT Station had a Magneto-Optical drive.
The first two pages of this bunch of papers that came with the machine, marked "aflasting" ("off-loading"), are jumper settings for Fujitsu's M2512A MO drive.
So the best theory now is that 0x0D is a way to probe for a MO drive configured as a disk drive.
I'll leave it at that for now, but I'm still interested to hear from anybody who spots 0x0d in old documents.
/phk
2019-12-07
Now it works!
My hacked up SCSI2SD firmware provided a log of all SCSI commands during boot and the two crucial ones turns out to be:
1a 00 03 00 24 00 1a 00 04 00 20 00
Those are "MODE SENSE(6)" commands, asking for page 3 and 4 respectively.
Page 3 is "Format Device" and the FUJITSU M2266 returns:
00 0f 00 00 00 00 00 2d 00 2d 04 00 00 01 00 05 00 0b 40 00 00 00 Tracks per Zone: 15 Alternate Sectors per Zone: 0 Alternate Tracks per Zone: 0 Alternate Tracks per Logical Unit: 45 Sectors per Track: 45 Data Bytes per Physical Sector: 1024 Interleave: 1 Track Skew Factor: 5 Cylinder Skew Factor: 11 SSEC: 0 HSEC: 1 RMB: 0 SURF: 0
Page 4 is "Rigid Disk Geometry" where the FUJITSU M2266 returns:
00 06 7a 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Number of Cylinders: 1658 Number of Heads: 15 Starting Cylinder-Write Precompensation: 0 Starting Cylinder-Reduced Write Current: 0 Drive Step Rate: 0 Landing Zone Cylinder: 0 RPL: 0 Rotational Offset: 0 Medium Rotation Rate: 0
After I hacked the SCSI2SD firmware to return those values, the R1000 booted flawlessly on the saved disk-images from PAM's machine.
Conclusion: The IOC bootstrap uses "modern SCSI" with linear block-numbers, the real system works in terms of Cylinders, Heads and Sectors.
/phk
2019-12-05
A little bit more SCSI2SD progress today.
I compiled a custom firmware for the SCSI2SD to get more logging, still sampling, but it should capture all failing commands now.
The picture from the log is the following:
First some SCSI READ(10) commands, which are compatible with the loading of KERNEL, FS and PROGRAM.
Then a SCSI command 0x0d, which is unknown everywhere I have looked. This command sets a flag in the KERNEL if successful, I have not investigated that flag further.
Next a SCSI MODE SENSE (RIGID DISK GEOMETRY)
Finally SCSI READ(6) commands which are past the DFS filesystem.
At the same time the console shows this:
Disk 0 is ONLINE and WRITE ENABLED Disk 1 is ONLINE and WRITE ENABLED IOP Kernel is initialized Initializing diagnostic file system ... File does not exist ERROR_LOG [OK]
Working theory right now is that the R1000 uses the RIGID DISK GEOMETRY data to calculate disk access, and what the SCSI2SD returns does not work for this.
I queried one of the blank Fujitsu disks we got from Terma in our "dumper machine" and it returns:
00 06 7a 0f 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 Number of Cylinders: 1658 Number of Heads: 15 Starting Cylinder-Write Precompensation: 0 Starting Cylinder-Reduced Write Current: 0 Drive Step Rate: 0 Landing Zone Cylinder: 0 RPL: 0 Rotational Offset: 0 Medium Rotation Rate: 0
Whereas the SCSI2SD seems to return:
00 38 30 06 00 00 00 00 00 00 00 00 00 00 00 00 00 00 1c 05 00 00 Number of Cylinders: 14384 Number of Heads: 6 Starting Cylinder-Write Precompensation: 0 Starting Cylinder-Reduced Write Current: 0 Drive Step Rate: 0 Landing Zone Cylinder: 0 RPL: 0 Rotational Offset: 0 Medium Rotation Rate: 7173
My next experiment will be to make the SCSI2SD firmware emit the same RIGID GEOMETRY as the Fujitsu disk.
2019-11-20
I made a DFS backup from PAM's Fujitsu disks, and read the remaining 8mm tapes.
/phk
2019-11-14
Got a bit further with the SCSI2SD: 1024 byte sector size helps, still get a DFS kernel panic though:
Initializing M400S I/O Processor Kernel 4_2_18 Disk 0 is ONLINE and WRITE ENABLED Disk 1 is ONLINE and WRITE ENABLED IOP Kernel is initialized Initializing diagnostic file system ... File does not exist ERROR_LOG [OK] I/O Processor Kernel Crash: error 0614 (hex) at PC=0000849E ************************************************
Next, tried to restore the backup of PAM's disks onto the Seagate disks, looks like success:
--- Booting the R1000 Environment --- Loading from file M207_44.M200_UCODE bound on November 5, 1992 at 6:08:17 PM Loading Register Files .... [OK] Loading : KAB.11.0.1.MLOAD Loading : KMI.11.0.0.MLOAD Loading : KKDIO.11.0.3.MLOAD Loading : KKD.11.0.0S.MLOAD Loading : KK.11.5.9K.MLOAD Loading : EEDB.11.2.0D.MLOAD Loading : UOSU.11.3.0D.MLOAD Loading : UED.10.0.0R.MLOAD Loading : UM.11.1.5D.MLOAD Loading : UAT.11.2.2D.MLOAD 851/1529 wired/total pages loaded. The use of this system is subject to the software license terms and conditions agreed upon between Rational and the Customer. Copyright 1992 by Rational. RESTRICTED RIGHTS LEGEND Use, duplication, or disclosure by the Government is subject to restrictions as set forth in subparagraph (c)(1)(ii) of the Rights in Technical Data and Computer Software clause at DoD FAR Supplement 252.227-7013. Rational 3320 Scott Blvd. Santa Clara, California 95054-3197 Starting R1000 Environment - it now owns this console. ====>> Kernel.11.5.8 <<==== Kernel: CHANGE_GHOST_LOGGING WANT TRACING: FALSE WANT LOGGING: FALSE Kernel: START_VIRTUAL_MEMORY ALLOW PAGE FAULTS: YES ====>> ERROR_LOG <<==== 22:55:05 --- TCP_IP_Driver.Worker.Finalized ====>> CONFIGURATOR <<==== starting diagnosis of configuration recovery is needed WANT TO BUILD NEW SYSTEM (YES/NO): yes starting creation of new system VOLUME NAME FOR unit 0: volume 1 VOLUME NAME FOR unit 1: volume 2 creating root volume: 1 adding volume 2 to virtual memory system creation of new system is complete starting virtual memory system the virtual memory system is up ====>> Kernel.11.5.8 <<==== Kernel: START_NETWORK_IO Kernel: START_ENVIRONMENT TRACE LEVEL: INFORMATIVE Kernel: ====>> Environment Elaborator <<==== Elaborating subsystem: ENVIRONMENT_DEBUGGER Elaborating subsystem: ABSTRACT_TYPES Elaborating subsystem: MISCELLANEOUS Elaborating subsystem: OS_UTILITIES ====>> Recovery <<==== Recovery Is Needed, Should I fake it? no Please tell me Volume Id for the Backup Index Tape: ====>> SYSTEM % RECOVERY <<==== Please Load Tape (Kind of Tape => CHAINED_ANSI, Direction => READING) Is Tape mounted and ready to read labels? yes Info on tape mounted on drive 0 is: (Kind of Tape => CHAINED_ANSI, Writeable => FALSE, Volume Id => 028600, Volume Set Name => BACKUP, 09-MAR-01 16:01:45 3) OK to read volume? [YES] ====>> Recovery <<==== Positioning tape to Backup Index Processing Backup Index Processing Tape File: Vol Info Processing Tape File: VP Info Processing Tape File: DB Backups Processing Tape File: DB Processors Processing Tape File: DB Disk Volumes Processing Tape File: DB Tape Volumes Positioning tape to Backup Data Processing Backup Data Processing Tape File: Space Info Vol 1 Processing Tape File: Block Info Vol 1 Processing Tape File: Block Data Vol 1 Processing Tape File: Space Info Vol 2 Processing Tape File: Block Info Vol 2 Processing Tape File: Block Data Vol 2 ====>> SYSTEM % RECOVERY <<==== Please Dismount Tape on Drive 0 (Kind of Tape => CHAINED_ANSI, Volume Id => 028600, Volume Set Name => BACKUP, 09-MAR-01 16:01:45 3) ====>> Recovery <<==== Tape Processing Complete. Restoring Data Restoring Spaces Taking 1st snapshot ====>> CONFIGURATOR <<==== starting snapshot snapshot is finished ====>> Recovery <<==== Updating Databases Taking 2nd snapshot ====>> CONFIGURATOR <<==== starting snapshot snapshot is finished ====>> Recovery <<==== Recovery Is Complete Garbage Collection can't run until the machine is rebooted Rebooting to enable Garbage Collection ====>> CONFIGURATOR <<==== starting virtual memory shutdown starting snapshot snapshot is finished virtual memory shutdown at ( 3, 26-MAR-01 04:22:46) system shutdown is complete *************************************** Sequencer has detected a machine check. ************************************************ Booting R1000 IOP after R1000 Halt or Machine Check detected Boot Reason code = 0C, from PC 0001ADA2 Restarting R1000-400S March 26th, 1901 at 04:23:01 OPERATOR MODE MENU - options are: 1 => Change BOOT/CRASH/MAINTENANCE options 2 => Change IOP CONFIGURATION 3 => Enable manual crash debugging (EXPERTS ONLY) 4 => Boot IOP, prompting for tape or disk 5 => Boot SYSTEM Enter option [Boot SYSTEM] :
The disks were formatted yesterday, so I do not have a total time for the restore, but reading the backup tape took 7 hours.
2019-11-13
Tried if the R1000 would accept a SCSI2S without much luck.
The RESHA EEPROM issues the usual terse style error message saying simply "SCSI Error", giving no usable details.
Formatted the Seagate disks so they are ready for an attempt to restore from the Backup-Tape.
If the SCSI2SD is in the machine along with disks, some DFS operations work, but since no defect list can be read, preparing the disk fails.
The two Fujitsu disks with the working image was never powered up today.
/phk
2019-10-29
R1000-400 with Facit A-4600 terminal showing the Rational Environment. Note the keyboard overlay that extends a couple of inches beyond the keyboard.
Keyboard overlay for the Rational Environment.
The front plate of a running R1000-400. ==
2019-10-28
The world has a running Rational R1000/400 computer again:
Thanks a LOT to Pierre-Alain Muller for driving all the way here to help make this happen!
2019-10-24
On suggestion by Pierre-Alain todays aim was an FRU diagnostics.
First the faulty Fujitsu disk was removed and the Seagate promoted to SCSI ID 0.
Then we got a "File does not exist HARDWARE.M200_CONFIG". M200, hmm... - We did the following:
OPERATOR MODE MENU - options are: 1 => Change BOOT/CRASH/MAINTENANCE options 2 => Change IOP CONFIGURATION 3 => Enable manual crash debugging (EXPERTS ONLY) 4 => Boot IOP, prompting for tape or disk 5 => Boot SYSTEM Enter option [Boot SYSTEM] : 1 Enable Modem DIALOUT [N] ? Enable Modem ANSWER [N] ? Enable IOP (IOC 68K) Auto Boot [N] ? y Enable R1000 CPU Auto Boot [N] ? n Enable AUTO CRASH RECOVERY [N] ? y Enable CONSOLE BREAK KEY [N] ? y Are these new defaults [N] ? y
then
CLI/CRASH MENU - options are: 1 => enter CLI 2 => make a CRASHDUMP tape 3 => display CRASH INFO 4 => Boot DDC configuration 5 => Boot EEDB configuration 6 => Boot STANDARD configuration Enter option [Boot EEDB configuration] : 1 CLI> x cedit Change hardware configuration [N] ? y File does not exist HARDWARE.M200_CONFIG Does this processor have 32 MB memory boards [Y] ? NOTE: 32 MB boards must be installed as MEM 0 or MEM 2 only. 8 MB boards cannot be in the same CPU as 32 MB boards. Does memory board 0 exist [Y] ? y Does memory board 2 exist [Y] ? n CLI> x expmon 0 32MB MEMORY BOARDS IN PROCESSOR - TOTAL OF 0 MEGABYTES. EM> bye CLI> quit
At this point we rebooted, but the memory board was still not detected. We then decided to take all 8 boards out of the machine starting with the two MEM 0/2 boards, photographing and then re-seating them. The 2 memory boards looked identical, so we switched their position. At the following boot we got: "Memory 2 exists but is not in configuration. Board will not be used.", so we changed the HW config again and included both MEM boards. We don't believe the changed positions did anything, but perhaps the power-cycle was needed to get them detected right?!
At this point we attempted the FRU-diagnostics:
CLI> x expmon 2 32MB MEMORY BOARDS IN PROCESSOR - TOTAL OF 64 MEGABYTES. EM> rd ... EM> poll_all NO MACHINE CHECKS DETECTED EM> poll_all SEQ HAS DETECTED A MACHINE CHECK EM> sm ... UCODE HALT AT 0102 ... EM> bye CLI> CLI> x rdiag ... DIAG> test/3 all Running FRU P1DCOMM Running FRU P1IOC Running FRU P1VAL Running FRU P1TYP Running FRU P1SEQ Running FRU P1FIU Running FRU P1MEM Running FRU P1SF Running FRU P2IOC Running FRU P2VAL Loading from file PHASE2_MULT_TEST.M200_UCODE bound on July 16, 1986 14:31:44 Loading Register Files and Dispatch Rams .... [OK] Loading Control Store [OK] Running FRU P2TYP Running FRU P2SEQ Running FRU P2FIU Running FRU P2MEM Running FRU P2UADR Running FRU P2FP Loading from file FPTEST.M200_UCODE bound on January 29, 1990 17:26:52 Loading Register Files and Dispatch Rams .... [OK] Loading Control Store [OK] Running FRU P2EVNT Running FRU P2STOP Running FRU P2ABUS Loading from file ABUS_TEST.M200_UCODE bound on July 22, 1986 13:27:21 Loading Register Files and Dispatch Rams .... [OK] Loading Control Store [OK] Running FRU P2CSA Running FRU P2MM Running FRU P2COND Running FRU P2UCODE Loading from file P2UCODE.M200_UCODE bound on August 6, 1986 16:16:27 Loading Register Files and Dispatch Rams .... [OK] Loading Control Store .......... [OK] Running FRU P3RAMS Running FRU P3UCODE Loading from file P3UCODE.M200_UCODE bound on August 6, 1986 13:50:42 Loading Register Files and Dispatch Rams .... [OK] Loading Control Store . [OK] PASSED
"PASSED"!!!
And a bit later the following line:
Starting R1000 Environment - it now owns this console.
Cleaned up log from the most interesting part of the session: Fil:20191024 1915 R1000.pdf
2019-10-17
Peter has returned with a working PSU :-) - The R1000 is now up and running again drawing 155 amps.
Almost full log of the session: Fil:20191017 1954 R1000.pdf
The Fujitsu 2266 disk appears to be faulty, R1000 complains about disk errors (our old acquaintance General Error?):
Options are: 0 => Exit 1 => Initialize disk (for experts only) 2 => Initialize disk, drop USR defects (internal use only) 3 => Show MFG and USR bad block locations 4 => Show only USR bad block locations 5 => Install new DFS only 6 => Show bad block count and DOS limits Enter option : 3 Enter unit number of disk to format/build/scan (usually 0) : 0 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=4038 CS2=0040 DS=11C0 ER1=0000 ER2=0000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 CS1=0038 CS2=0040 DS=11C0 ER1=0100 ER2=4000 EC1=0000 EC2=0000 DC=0000 DA=0005 ** ABORT: Can't retrieve labels due to disk errors.
A Seagate ST41200N is now installed as DISK 1, the Fujitsu remains as DISK 0 for now. R1000 recognizes the Seagate but wants to format it:
Initializing M400S I/O Processor Kernel 4_2_16 Spinning up disk 1 Spinning up disk 0 Disk 1 is ONLINE and WRITE ENABLED IOP Kernel is initialized Enable line printer for console output [N] ? RECOVERY 14.04 92/09/17 10:00:00\ Options are: 0 => Exit 1 => Initialize disk (for experts only) 2 => Initialize disk, drop USR defects (internal use only) 3 => Show MFG and USR bad block locations 4 => Show only USR bad block locations 5 => Install new DFS only 6 => Show bad block count and DOS limits Enter option : 3 Enter unit number of disk to format/build/scan (usually 0) : 1 ** ABORT: Disk has no labels. Options are: 0 => Exit 1 => Initialize disk (for experts only) 2 => Initialize disk, drop USR defects (internal use only) 3 => Show MFG and USR bad block locations 4 => Show only USR bad block locations 5 => Install new DFS only 6 => Show bad block count and DOS limits Enter option : 1 Enter unit number of disk to format/build/scan (usually 0) : 1 Disk has no labels. Drive types are: 1 - Fujitsu 2263 2 - Fujitsu 2266 3 - SEGATE ST41200N 0 - Other Enter drive type : 3 Enter HDA serial number : TJ617458 Disk must be formated. Formatting the drive will take about 35 minutes. Elapsed time is 00:32:32 Writing bad block information. Writing boot label. Writing DFS label. Do you want to build a diagnostic file system on this unit [Y] ? Enter last cylinder to be used by the DFS [ Hint => 76 ]:76 Enter first cylinder to be used for read/write diagnostics [ Hint => 1889 ]:1889 Writing shared label. Constructing free list. Writing free list. Allocating and initializing directory. Creating predefined files. KERNEL_0.M200 KERNEL_1.M200 KERNEL_2.M200 FS_0.M200 FS_1.M200 FS_2.M200 PROGRAM_0.M200 PROGRAM_1.M200 PROGRAM_2.M200 DFS_BOOTSTRAP.M200 ERROR_LOG Do you want to load files into the DFS on this unit [Y] ? y Tape drive unit number : 0 Do you want to display filenames as they are loaded [Y] ? y Reading -> DFS_BOOTSTRAP.M200 Reading -> KERNEL_0.M200 ... 3160 files later ... Reading -> DDC.M200_CONFIG Elapsed time is 00:10:40 Options are: 0 => Exit 1 => Initialize disk (for experts only) 2 => Initialize disk, drop USR defects (internal use only) 3 => Show MFG and USR bad block locations 4 => Show only USR bad block locations 5 => Install new DFS only 6 => Show bad block count and DOS limits Enter option : 0 Boot disk has been rebuilt or the IOP was booted from tape. You must crash the machine to exit.
Next week, boot from disk and see how far we get. PSU got *hot*, but survived the 1½ hour session.
2019-10-03
After rumaging through our entire workshop, it transpires that we have no solder-iron with sufficient power to unsolder the capacitors from the thick copper on the PCB.
One of our members, Peter, has offered to attempt the repair in his own workshop, and he picked it up tonight.
2019-09-12
PSU dismantled, and the visible defective Electrolyte has been soldered out together with two Tantalum. Unfortunately, when mounted, the leads were pinched and cut, making them difficult to pull through the PCB today since the holes are almost exactly the size of the leads. The insulation on the wires to the transformer has deteriorated quite a bit and will need some repair.
Some better images of the PSU as a whole:
Overview
One half of the PSU
Other half of the PSU
Defect insulation on some wires
One of the two capacitor boards on the 5V rails.
Backside of PCB carrying the damaged capacitor.
Each of the two PCBs carry: 5 Tantalum capacitors 15µF, and 5 6800µF SXF 30mm x 18mm, lead spacing 7.5mm
2019-09-05
Arriving today, expecting a fight, armed with various debugging plans, the Rational just started, booted and were happy as could be?!? - After some configuration, and booting the kernel "M400S_KERNEL_0.M200" (thanks to Pierre-Alain for supplying that information), the R1000 now responds with:
R1000-400 IOC SELFTEST 1.3.2 512 KB memory ... [OK] Memory parity ... [OK] I/O bus control ... [OK] I/O bus map ... [OK] I/O bus map parity ... [OK] I/O bus transactions ... [OK] PIT ... [OK] Modem DUART channel ... [OK] Diagnostic DUART channel ... [OK] Clock / Calendar ... [OK] Checking for RESHA board RESHA EEProm Interface ... [OK] Downloading RESHA EEProm 0 - TEST Downloading RESHA EEProm 1 - LANCE Downloading RESHA EEProm 2 - DISK Downloading RESHA EEProm 3 - TAPE DIAGNOSTIC MODEM ... DISABLED RESHA VME sub-tests ... [OK] LANCE chip Selftest ... [OK] RESHA DISK SCSI sub-tests ... [OK] RESHA TAPE SCSI sub-tests ... [OK] Local interrupts ... [OK] Illegal reference protection ... [OK] I/O bus parity ... [OK] I/O bus spurious interrupts ... [OK] Temperature sensors ... [OK] IOC diagnostic processor ... [OK] Power margining ... [OK] Clock margining ... [OK] Selftest passed Restarting R1000-400S January 14th, 1901 at 22:56:43 OPERATOR MODE MENU - options are: 1 => Change BOOT/CRASH/MAINTENANCE options 2 => Change IOP CONFIGURATION 3 => Enable manual crash debugging (EXPERTS ONLY) 4 => Boot IOP, prompting for tape or disk 5 => Boot SYSTEM Enter option [Boot SYSTEM] : 5 Logical tape drive 0 is an 8mm cartridge tape drive. Logical tape drive 1 is declared non-existent. Logical tape drive 2 is declared non-existent. Logical tape drive 3 is declared non-existent. Booting I/O Processor with Bootstrap version 0.4 Boot from (Tn or Dn) [D0] : T0 Tape_Boot_1.2.0 920401 Waiting for tape unit ready. Strike any key to abort..................... End of Tape Reached.rewinding Select files to boot [D=DEFAULT, O=OPERATOR_SUPPLIED] : [D] Skipping.. Loading FS_0.M200 Loading RECOVERY.M200 Skipping................. Loading M400S_KERNEL_0.M200 Initializing M400S I/O Processor Kernel 4_2_16 Spinning up disk 0 IOP Kernel is initialized Enable line printer for console output [N] ? RECOVERY 14.04 92/09/17 10:00:00\ Options are: 0 => Exit 1 => Initialize disk (for experts only) 2 => Initialize disk, drop USR defects (internal use only) 3 => Show MFG and USR bad block locations 4 => Show only USR bad block locations 5 => Install new DFS only 6 => Show bad block count and DOS limits Enter option : *** AC power is L
The last line is probably from the time where I cut the power after seeing significant white-gray smoke coming up from the machine...
The following smell-test suggested that the PSU should be checked:
The capacitors are located between the 5V power rails (the two black blocks on each side of the cap).
Next job: Acquire and exchange 10 x 6800µF 6.3V capacitors (L<31, D<=18)
2019-08-29
Disappointment! - We had hoped to get further in the boot process, but met an "unwilling" machine that didn't even presented itself. The PSU powered up with its fan, but that was it - No lights, no 5V, -12V or +12V. During power-off, these lights turned for a very short period, indicating the PSU is capable but unwilling (Inhibit line active?). We are not completely sure of the reason, but we will debug the issue next week, starting with checking the RESHA diagrams, followed up by checking the IOC RTC-battery (which were replaced last week).
2019-08-22
After further tests we finally took the leap and grabbed the cutter and solder iron, and replaced the three suspected memory chips. Boot sequence with the original BIOS now gives:
R1000-400 IOC SELFTEST 1.3.2 512 KB memory ... [OK] Memory parity ... [OK] I/O bus control ... [OK] I/O bus map ... [OK] I/O bus map parity ... [OK] I/O bus transactions ... [OK] PIT ... [OK] Modem DUART channel ... [OK] Diagnostic DUART channel ... [OK] Clock / Calendar ... [OK] Checking for RESHA board -- Bench mode (ID 7) detected Skipping RESHA tests Local interrupts ... [OK] Illegal reference protection ... [OK] I/O bus parity ... [OK] I/O bus spurious interrupts ... [OK] Temperature sensors ... [OK] IOC diagnostic processor ... [OK] Power margining ... [OK] Clock margining ... [OK] Selftest passed Restarting R1000-400S January 1st, 1901 at 00:03:56 Logical tape drive 0 is an 8mm cartridge tape drive. Logical tape drive 1 is declared non-existent. Logical tape drive 2 is declared non-existent. Logical tape drive 3 is declared non-existent. Booting I/O Processor with Bootstrap version 0.4 Boot from (Tn or Dn) [D0] :
Success!
Next step: Mount the board into its rightful place and see how far it gets now...
2019-06-06
Decided to make further measurements with the oscilloscope in order to rule out other causes. Probing all pins on a good and a bad RAM chip did not reveal anything.
Tried to piggyback H11 with a new RAM-chip, and the questionable bit went mid between VCC and GND. It could be that the good chip tried to pull down while the bad chip pulled up. Double-checked chip-select on H3, the only other identified chip that may drive the same data line (D30), and chip-select is completely passive during the troublesome period.
Previous tests went between [0x00000000..0x00040000[ and [0x00040000..0x00080000[. Now tried to run the tests between [0x00001000..0x00021000[ and [0x00041000..0x00061000[ as well as between [0x00001200..0x00021200[ and [0x00041200..0x00061200[ to test whether run-length could be a factor. The exact same failing addresses indicate run-length is not a factor.
Tried a reverse scan, Set(addr), Clr(addr), Set(addr) then Get(addr-4). The same data bits are affected in both banks, but the failing addresses are not identical. Some patterns are similar, and some other patterns appears.
Tried another test with: Set(addr), Clr(addr), Set(addr), Get(addr+4), branch-test delay, then another Get(addr+4) - The second Get reads out correctly. this indicates that the chip does have the right value, but that it is incorrectly read out in some circumstances.
2019-05-23
Previous software tests indicated problems with certain bits at some address patterns: Bits 7 and 23 in the low bank and bit 30 in the high bank showed issues. The fault manifests itself at some addresses when the following memory accesses are done in quick succession: Set(addr), Clr(addr), Set(addr) then Get(addr+4) - The Get(addr+4) returns incorrect values only on these bits.
Today all RAM chips were checked with oscilloscope to verify and possibly identify the problem. H11, G10 and G41 showed different behavior on the oscilloscope, and these chips happens to map to the exact bits identified at the software test.
H40 did show a little flickering on the DC levels, but the flanks seemed OK.
The above input to the RAM-chips looks like this:
The output of a healthy chip looks like this:
The output of the sick chips looks like this:
Next step will be to replace them. We have replacements ready, so stay tuned...
2019-03-07
Tried to patch the EEPROM in various ways, and learned a lot more.
If we skip the offending memory check, (and the EEPROM checksum because we're lazy) we get all the way to the boot device prompt (tape/disk).
We got two-way serial connection to the console port: TX is TTL level, RX is RS-232 level.
In trivial homebrew tests, the RAM does not fail, but what we call "ramtest_5" repeatedly does.
Big discovery of the night: The two top address bits of the EEPROM are swapped on the PCB, so the middle two quarters are swapped in the image we try to reverse-engineer. After fixing that, the contents make a lot more sense.
2019-02-28
Managed to power up the IOC board stand-alone. 5V @ 35A required (30A didn't seem to be quite enough).
Procedure:
- 5V @ 35A connected to 3 Capacitors at edge (to distribute load).
- Reset (GB113) to Ground (page 23 of R1000_SCHEM_IOC.pdf).
- CTS# (GB055) to 5V (page 25)
- Power On
- Release Reset (GB113)
TTL Serial output read from CPDRV0 @N1 (pin 2 or 3).
As expected output is still:
R1000-400 IOC SELFTEST 1.3.2 512 KB memory ... * * * * * * * FAILED
EEPROM 28256 is not compatible with EPROM 27256!