Bitstore/Metadata

Fra DDHFwiki
Spring til navigation Spring til søgning

Handling of object metadata has been the hardest part of this project.

Official museums struggle with this as well, and as a rule their solutions seem to me to be centered around a work-flow where you spend the time it takes to get things properly registered once and for all, possibly even whole days of research per artifact.

That's a no-go for us, we neither have the time nor the skill, so pretty much all of the nifty metadata models and metadata formats they have invented are beyond our reach.

Instead we go back to basics: Metadata is a UTF-8 text-file, which we validate against a data model.

Here is a mock-up metadata file:

   General:
           User:           Poul-Henning Kamp
           Format:         TIFF
   
   Image:
           Keywords:       RC, Ålborg, Gier
           ?Date:          1970
           Photographer:   RC Intern
           Description:    
                           Billede af GIER i maskinrum.
                           Bufferlager + to tromler.
                           Potteplante i vindueskarm.

The file consists of "stanzas", in this case the minimum two, the "General" which is always mandatory, and "Image" used for images/graphics/photos etc.

Not the question-mark before "Date:" which marks the field as uncertain or speculative because we want this bit of information to be outside the actual field contents.

When an object is submitted to the bit-store, the each stanza of the metadata file will be validated by a python-class, but most stanzas can be handled by a base-class, which reads a couple of simple data-structures.

Here is a mock-up description of the "General" stanza:

   #######################################################################
   #
   # General - Mandatory for all artifacts
   #
   #######################################################################
   
   General_User = {
       "Poul-Henning Kamp":        True,
       "Finn Verner Nielsen":      True,
       "Carsten Jensen":           True,
   }
   
   General_Format = {
       "PDF":                      True,
       "ASCII":                    True,
       "BINARY":                   True,
       "TIFF":                     True,
   }
   
   General_Collections = {
       "Sparekassemuseet":         True,
   }
   
   datadict["General"] = {
       "Doc":                      "General metadata for all artifacts",
       "Use":                      "Shall",
       "Fields": {
           "User": {
               "Doc":              "Name of archiving user",
               "Use":              "Shall",
               "Format":           "String",
               "Dictionary":       General_User,
           },
           "Format": {
               "Doc":              "File format",
               "Use":              "Shall",
               "Format":           "String",
               "Dictionary":       General_Format,
           },
           "Collection": {
               "Doc":              "Collection this is part of",
               "Use":              "Should",
               "Format":           "String",
               "Dictionary":       General_Collections,
           },
           "Copyright": {
               "Doc":              "Copyright status",
               "Use":              "Should",
               "Format":           "String",
           },
           "Private": {
               "Doc":              "Object marked private (explain why!)",
               "Use":              "Should",
               "Format":           "String",
           },
       },
   }

Backend code

The backend code will receive the submitted file+metadata, or just metadata referring to an already stored object, and validate according to the data dictionary.

If validation fails, an error is returned immediately.

If validation succeeds, the submission will be put in a queue for approval, and when approved, it will be stored for good.

Frontend code

This is basically a stand-alone python program which does as much validation as possible, according to the data dictionary, to allow people to work offline.

This will be useful for batch processing, where the metadata files are produced from an existing data source.

UI code

It should be possible to write a python program with a nice GUI, which runs on both Windows and UNIX, where the data-dictionary can guide input in real time.