Rational/R1000s400/Logbook
2023-03-15 And now for something entirely different…
Wayne Meretsky told us via email that the Oregon Software PASCAL compiler for the IOP programs, and after a number of not very productive searches, we finally found that one of the founders of Oregon Software had put almost exactly the most interesting parts of the compiler up on github:
Some good clues about the peephole-optimizations can be found here:
https://github.com/dhogaza/mc68000_pascal_2/blob/main/cross/sources/code.pas#L639
2023-03-10 No emulating after bedtime
Every so often, in late evening sessions, we would persistently run into this error:
PASCAL error #4 at location 000218D0 Last called from location 000220C8 Last called from location 0002251C Last called from location 00023296
And being late at night, the obvious thing was to go to bed and look at it with fresher eyes later … where it then fails to happen.
We finally dived into the disassembled PASCAL code of LOADER.M200
and found the bug, which was, of course, painfully obvious in the tracefile, if we had not been so sleepy:
Calendar([14 03 08 03 16 13 11]) Calendar([14 03 08 03 16 13 12]) Calendar([14 03 08 03 16 13 13]) Calendar([14 03 08 03 16 14 57]) Calendar([14 03 08 03 17 47 21]) Calendar([14 03 08 03 24 18 28])
These are the timestamps read from the RTC chip by kernel system-call 0x13, and the fields are:
Year-1 (binary) Month (BCD) Day (BCD) Weekday Hour (BCD) Minute (BCD) Second (BCD)
The RTC emulation had a bug relating to the values 0x18, 0x24 and 24.
This is now fixed, and we implemented the missing midnight increment of the day/wday/month fields as well.
2023-02-26 License or not
Having established that the real machine does not assert when the cluster-ID is wrong and the licenses invalid, it is obvious to ask: Does the machine boot faster if those software products are unlicensed?
A quick run this morning confirms that it is so.
An licensed boot takes 1327 seconds, an unlicensed only 1194 seconds.
It is the time
between the "INITIALIZING"
and "username:"
console messages,
which is a bit more than two minutes shorter.
This is good to know, but not actionable until the emulator makes it all the way to
that INITIALIZING
message.
2023-02-25 Different. Still lousy, but different
Well…
ADA_MANAGEMENT.11.50.4D DISK_CLEANER.11.1.3D PARSER.11.50.1D PRETTY_PRINTER.11.50.3D
====>> Environment Log <<==== 09:58:39 !!! EEDB Assert_Failure Unable to elaborate PRETTY_PRINTER.11.50.3D: Unhandled exception - 16#84# at PC 00169C1300007FE0 ====>> Elaborator Database <<==== Unable to elaborate PRETTY_PRINTER.11.50.3D: Unhandled exception - 16#84# at PC& 00169C1300007FE0 EEDB: %
So this run was the "megacomp4" branch where we have attempted to optimize things, so the next logical experiment is to fall back to the "main" branch, which is our ground-truth, using the "much-deleted" disk-images, and see what that does.
At least we will find out in a couple of weeks, rather than months.
We tried booting the real machine with a wrong cluster ID, and it produced the same »Product_Authorization Invalid for« messages, but did not assert, so that is also a solid indication that something is wrong still.
2023-02-24 Good Omens
A certain part of the failing runs look like this:
Elaborating subsystem: ELABORATOR_DATABASE ====>> Elaborator Database <<==== NETWORK.11.1.5D OM_MECHANISMS.11.2.0D BASIC_MANAGERS.11.3.0D
While on the real machine it looks like this:
Elaborating subsystem: ELABORATOR_DATABASE ====>> Elaborator Database <<==== NETWORK.11.1.5D OM_MECHANISMS.11.2.0D ====>> Environment Log <<==== 10:48:37 +++ Low_Level_Action.Manager Restoration_From_Archive ====>> Elaborator Database <<==== BASIC_MANAGERS.11.3.0D
And, as of this morning, so does it in emulator when running with the cluster number from PAM's machine.
2023-02-22 Second long run failed same way
After the three months long run failed in November, we started the exact same run again, to get an idea how reproducible the failure-mode is, and this morning we have the answer: It failed precisely the same way:
Kernel assert failure detected by kernel debugger stub. System shutting down. Exception: !Lrm.System.Assertion_Error, from PC = #F6413, #707
A week ago we got the exact same failure after only 16 days, using the new much-deleted disk-images, so we are very confident that this is something we will be able to track down.
Segment #F6413
is loaded in KK.11.5.9K.MLOAD
where it is commented as "BTREE_MANAGER", which sounds like very fundamental Ada-code.
We are seven days into the run where we patched the cluster-id to match PAM's machine, not out of the "starting VM" phase yet, but getting there, in some days we will know if that solves the "Product_Authorization Invalid for […]" issue, and a couple of weeks from now, we will know if that also solves the BTREE_MANAGER assert.
In the meantime, "mega-component" optimizations continue.
2023-02-14 Patching the cluster-id
We have started a new run identical to the 16 day run which just failed, except for one detail: We have patched the cluster-id of PAM's machine into the IOC_EEPROM with the following CLI command to the emulator:
cli firmware patch IOC_EEPROM 0x7e84 0x00 0x06 0x3b 0x8b
Hopefully that fixes at least the Product_authorization
issues.
2023-02-14 Failing five times faster
The bad news:
The much-deleted-diskimage + megacomp4 run failed after 16 days.
The good news:
It failed precisely the same way (see below), as the three month run on the full diskimage did, back in November 2022.
The product authorization failures is undoubtedly a very strong clue, we just dont know what it means.
The code segment which fails,#F6413, is 1008659.SEG.
In the .MLOAD
file it has the comment -- BTREE_MANAGER
and the instruction at #707 is:
0707 0257 Execute Discrete,Raise
Now we need to find out why…
DISK_CLEANER.11.1.3D PARSER.11.50.1D PRETTY_PRINTER.11.50.3D DIRECTORY.11.4.6D INPUT_OUTPUT.11.7.0D ====>> Environment Log <<==== 05:49:07 !!! Product_Authorization Invalid for Work_Orders 05:49:07 !!! Product_Authorization Invalid for Cmvc 05:49:08 !!! Product_Authorization Invalid for Rci 05:49:08 !!! Product_Authorization Invalid for Tcp/Ip 05:49:08 !!! Product_Authorization Invalid for Rpc 05:49:09 !!! Product_Authorization Invalid for Insight 05:49:09 !!! Product_Authorization Invalid for Rs6000_Aix_Ibm 05:49:09 !!! Product_Authorization Invalid for X Interface 05:49:09 !!! Product_Authorization Invalid for Cmvc.Source_Control 05:49:10 !!! Product_Authorization Invalid for Rcf 05:49:10 !!! Product_Authorization Invalid for Testmate 05:49:10 !!! Product_Authorization Invalid for Fundamental Session 05:49:11 !!! Product_Authorization Invalid for Lrm_Interface 05:49:11 !!! Product_Authorization Invalid for Telnet 05:49:11 !!! Product_Authorization Invalid for Ftp 05:49:11 !!! Product_Authorization Invalid for X_Library 05:49:12 !!! Product_Authorization Invalid for Dtia ====>> Elaborator Database <<==== COMPILER_UTILITIES.11.51.0D SEMANTICS.11.50.3D R1000_DEPENDENT.11.51.0D R1000_CHECKING.11.51.0D R1000_CODE_GEN.11.51.0D IMAGE.11.4.2D CORE_EDITOR.11.6.2D TOOLS.11.5.1D OE_MECHANISMS.11.1.2D OBJECT_EDITOR.11.6.1D MAIL.DELTA OS_COMMANDS.11.6.2D ====>> Kernel.11.5.8 <<==== Kernel: Kernel assert failure detected by kernel debugger stub. System shutting down. Exception: !Lrm.System.Assertion_Error, from PC = #F6413, #707
2023-02-07 Getting somewhere 8x faster
Thanks to the much-stuff-deleted-diskimages, we can now get to the code after the VM initialization in 9 days, instead of 72 days:
rr001 (cyan&green) is main branch and original disk-images
rr007 (purple&orange) is megacomp4 branch and reduced disk-images
2023-01-28 Belated Happy New-year!
(Apologies for the lack of updates this last month, a bacterial pneumonia stole a couple of weeks.)
We have finally found the secret to making the R1000, real or emulated, boot faster: The nightly daemon run.
PAM told us that the way to get faster boot was to "delete stuff, in particular ADA objects".
We have tried that a number of times, but to no avail: The boot time stubbornly took one hour and some minutes.
Then some random experiments made us to run so much out of disk-space, that we had to run the nightly daemon run to proceed.
That took 4+ hours, and then the next boot was 10 minutes faster - Secret sauce detected!
With the pristine disk-image it take 2076 seconds to get from starting virtual memory system
to the virtual memory system is up
After a two week campaign of sessions of wanton destruction, each concluding with daemon.run("daily")
, we have reduced that to 284 seconds.
At the current speed, that should make the emulation of the "main" branch boot 70 days faster: 58 days instead of 128 days.
2022…2012
- 2022 Logbook entries - We are getting somewhere with the emulation
- 2021 Logbook entries - We discover there is no instruction set, and start simulating hardware
- 2020 Logbook entries - Covid-19 happens, and we start to create a software emulation
- 2019 Logbook entries - We fix the IOC RAM error, and Pierre-Alain Muller drops by for startup
- 2014-2018 - The project got stuck for lack of a sufficiently beefy 5V power-supply, and then phk disappeared while he built a new house.
- 2013 Logbook entries - Documentation preservation and we run into the IOC RAM error
- 2012 Logbook entries - Preservation starts and we borrow tapes from Pierre-Alain Muller
Many thanks to
- Erlo Haugen
- Grady Booch
- Grek Bek
- Pierre-Alain Muller
- Michael Druke
- Pascal Leroy