The bits we rescue and curate from old media, be it by reading paper-tapes or scanning manuals, need to be stored reliably and safely.
The "bitstore" project is our project to build such a facility, at a cost we can afford.
In some senses we are breaking new ground, few if any museums or collections seems to have thought about how programs can be curated and presented, but in many other senses, we can piggy-bag on efforts by the "real" digital collections, such as Netarkivet.dk and The Internet Archive.
Some of the challenges we face are variants of known and analyzed problems.
When we need to translate the character set of an old GIER ALGOL program, it is but a simple case of converting Word Perfect documents to PDF for presentation.
Other places, we may break new ground, for instance by interactively disassembling binary programs with feedback from the user.
Yes, we dream big and there is so much code to hack (and present) and so little time.
Conceptually, we have partitioned the task into three layers:
We will use the WARC (Web ARChive) fileformat for storage and preservation. Originally a internal format developed by The Internet Archive, it is now an ISO standard (28500) specifically for storing "discovered" bits in a collection.
Once the bits are safe, we need to find and record the meaning and historical context, the "data-archeology" so to speak. This means capturing and recording the meta-data about the bits. What kind of media, where we got it from and so on.
For metadata, we aim for Dublin Core in some shape or form, but we have yet to fully figure out how one goes about that, in a way which is compatible with a volunteer crew with very limited time in the collection.
Research and presentation of historical bits requires transformations, and different transformations for different kinds of bits. We will need to map GIER's IBM typewrite character set to UNICODE to show the ALGOL programs, and we will need to disassemble binary programs for which no source code exists. It will also be nice to be able to refer directly to bits of scanned manuals or even to execute programs in simulators. Some kind of extensible web-based architecture is required.
As of now, this wiki-page is the most tangible part of the project.