⟦af2072e5f⟧

WangText

D…07…C…0f…C…00…C…06…B…09…B…0a…B…01…B…86…1







  …02…

  …02…   …02…


    5264A/rt
…02…FIX/1000/PSP/0038

…02…APE/850529…02……02…

FIKS SYSTEM
SPECIFICATION








        FIKS

4.3.4    D̲u̲a̲l̲ ̲N̲O̲D̲E̲/̲M̲E̲D̲E̲ ̲D̲e̲s̲i̲g̲n̲

         To make the operating of the FIKS system more independent
         of failures in the hardware, some of the most malfunction
         sensitive hardware elements has been duplicated. If
         the hardware component used (the ACTIVE) becomes eroneous,
         then the counterpart (the STANDBY) shall be ready to
         take over the operations to be performed. To be able
         to execute this SWITCHOVER of hardware with minimum
         effort of the system operational management and inconvenience
         for the user of the FIKS system, special design has
         been made to handle those procedures. The design has
         been implemented taken the following as guide lines/requirements.

         -   No narrative message must be last.

         -   The security operational modes must not be affected.

         -   Internal Node/MEDE routing information shall be
             maintained.

         -   The Data Users must not be affected.

         -   The Node/MEDE shall be inoperable minimum of time
             (less than 2 minutes).

4.3.4.1  D̲e̲s̲i̲g̲n̲ ̲O̲v̲e̲r̲v̲i̲e̲w̲

         In fig. 4.3.4.1 is shown a simplified Dual Node/MEDE
         hardware configuration with especially emphasizing
         on the redundant elements. It is noted that:

         -   The CR80-computer hardware including user- and
             file processors and TDX-hosts are dualized. This
             is in the following denoted as a BRANCH.

         -   The TDX-controllers are dualized.

         -   The system is equipped with a Dual Disk System.
             This can be accessed from both of the branches.

         In ease of a fatal error in the active branch, i.e.
         an error which makes it impossible for the branch to
         continue operations, a SWITCHOVER from the active to
         the standby branch has to be performed. The standby
         branch has to be RECOVER'ed: This means that it must
         be brought into a position from where the former active
         failed. The standby branch has in advance been loaded
         with all the necessary software modules, ready to start
         executing of instructions. The disk system and the
         TDX-system can immediately be used. The vital thing
         missing to start the operations is the data placed
         in the CR80-memory of the former active branch. These
         data are recovered by use of CHECKPOINT's. Checkpoints
         are data records stored outside the CR80-memory, and
         that define the states and substates of the system
         e.g.

         -   state of message being processed.
         -   state of trunk.
         -   state of terminal.

         The records are in the FIKS-system stored in the disk
         system, from where they can be retrieved by the standby
         branch. After the processes are started in the standby
         branch, the checkpoints are processed in the RESTART
         procedure to reestablish the data structures in the
         CR80-memory as close as possible to the original content
         in the former active branch. The operations can now
         continue.…86…1         …02…   …02…   …02…   …02…


          Figure 4.3.4.1…01…Dual Node/MEDE Hardware

4.3.4.2  W̲a̲t̲c̲h̲d̲o̲g̲

         The Watchdog is a separate microcomputer, with the
         capability to switch a relay board and to communicate
         with the elements, it controls. Namely:

         -   2 x Node/MEDE branches
         -   2 x 2 TDX Bus controllers.

         In addition the Watchdog manages the use of the console,
         ref. fig. 4.3.4.2.1.

         W̲A̲T̲C̲H̲ ̲O̲F̲ ̲N̲o̲d̲e̲/̲M̲E̲D̲E̲-̲b̲r̲a̲n̲c̲h̲e̲s̲.̲

         The controlling is based upon the following inputs:

         -   Answers (or missing answers) upon requests for
             'Alive Status Reports' sent to the operational
             system (ESP) with regular time intervals.

         -   Error reports sent from the FIKS application processes.

         -   Input from the system operator.

         Normally one of the branches will be the ACTIVE and
         the other the STANDBY. If the Watchdog senses a fatal
         error in the active branch, based upon Alive Status
         Reports and error reports, it shall arrange and manage
         switchover of the branches. I.e. let the formerly standby
         branch become the new active and see to that the formerly
         active stops the erroneous processing (be CLOSED).
         To do this, the Watchdog has certain actions at the
         disposal:

         -   Asking the system operator for allowance to switchover
             with the console printout.

             "SWITCHOVER FROM P1 TO P2 ALLOWED?"

             If the operator accepts or he does not answer,
             the switchover will be executed.

         -   Issuing of commands to the Node/MEDE operational
             system (ESP).

             The commands are as follows:

             CLOSE                A controlled termination of
                                  the operations in the branch
                                  shall be performed.

             RECOVER              The branch shall prepare to
                                  take over the active operations
                                  and then do it.

             STANDBY ON/          A report is send to the
             STANDBY OFF          active branch telling that
                                  the standby branch is available/not
                                  available for taking over
                                  operation. This information
                                  is passed on to the SCC.…86…1
                                          …02…   …02…   …02…   …02…



       Figure 4.3.4.2.1…01…Watchdog Interface Diagram

         C̲o̲n̲t̲r̲o̲l̲l̲i̲n̲g̲ ̲o̲f̲ ̲t̲h̲e̲ ̲T̲D̲X̲-̲c̲o̲n̲t̲r̲o̲l̲l̲e̲r̲s̲

         The controlling is based upon a polling of the TDX-controllers.
         If a polling indicates an error in the active controller
         then (if possible) switchover to the standby is performed.
         The system operator has the opportunity to initiate
         the switchover by himself.

         M̲a̲n̲a̲g̲e̲m̲e̲n̲t̲ ̲o̲f̲ ̲t̲h̲e̲ ̲C̲o̲n̲s̲o̲l̲e̲

         The Watchdog has the task of managing the console,
         as a resource shared between the Watchdog itself and
         the two Node/MEDE-branches. These may in turn be connected
         to the console. The operator select by means of control
         keys strokes which of the three units the console shall
         communicate with:

         CTRL/W: Watchdog monitor mode
         CTRL/O: Transparent mode BRANCH ONE
         CTRL/T: Transparent mode BRANCH TWO

         The transparent modes are used, when the console shall
         act a system console for one of the branches i.e. when
         bootloading the system, doing offline diagnostics,
         etc. In these modes the branches are not supervised
         by the Watchdog. In the monitor mode the operator can
         by keying in:

         C:  Ask for printout of current Watchdog status.

         R:  Switchover of the red TDX-controllers.

         B:  Switchover of the black TDX-controllers.

         M̲a̲n̲u̲a̲l̲ ̲o̲p̲e̲r̲a̲t̲i̲o̲n̲ ̲o̲f̲ ̲t̲h̲e̲ ̲W̲a̲t̲c̲h̲d̲o̲g̲

         In the Watchdog is placed a self checking mechanism,
         which starts a visual alarm in the relay board in case
         of failure in the Watchdog-CPU (ref. fig. 4.3.4.2.2).
         It is then possible to run the system by manipulating
         the switches on the front panel in a manual mode. By
         setting the switch to manual, one can select the wanted
         controller and sending "master clear" to the specified
         branch.

         O̲p̲e̲r̲a̲t̲i̲n̲g̲ ̲t̲h̲e̲ ̲s̲y̲s̲t̲e̲m̲ ̲w̲i̲t̲h̲o̲u̲t̲ ̲t̲h̲e̲ ̲W̲a̲t̲c̲h̲d̲o̲g̲

         The operational system (ESP) is transparent to where
         it receives the Watchdog commands from. Therefore it
         is possible by connecting the console directly to the
         SCM-bord in a branch, for a system operator to issue
         commands to the branch just as if it was the Watchdog.
         In this way the Node/MEDE can be operational even if
         the Watchdog is missing.

Figure 4.3.4.2.2…01…Front Panel Layout

4.3.4.3  E̲S̲P̲ ̲S̲y̲s̲t̲e̲m̲

         In the FIKS System the ESP System (ERROR SWITCHOVER
         PROCESS) makes out the FIKS System Operational Software.
         The ESP has been designed to handle the following tasks:

         -   Interface to the Watchdog and the system operator,
             ref. 4.3.4.2.

         -   CR80 Memory Management.
             As the FIKS system software in principles is loaded
             and runs forever, there is no need for dynamic
             allocation/delocation of memory areas. The memory
             management is therefore mostly concerned with utilizing
             the memory in an optimal way (no gaps in memory).
             The system maintainer specifies a strategy for
             laying out the memory. In the initializing phase
             (ref. sec. 4.3.4.4) the allocation of memory in
             carefully logged. It is then left to the system
             maintainer to see if, it is convenient. If not,
             he can change the memory layout.

         -   Supervision/Management of the Disk System, ref.
             sec. 4.3.4.8.

         -   System Command Performance.
             The system is controlled by issuing of commands
             to the ESP, which then executes the commands. The
             commands may have their origin from different kind
             of sources.

             -   the Watchdog (ref. sec. 4.3.4.3)
             -   The operator
             -   items in a Job Control File, created and edited
                 offline. The commands in this file is read
                 and executed sequential. A whole sequence of
                 command can be executed in this manner with
                 a single command.
             -   sequence of commands. At moments where there
                 is no access to the disk system, Job Control
                 Files can not be used. Instead commands sequences
                 fetched from internal ESP-data is used.
             -   applications. Execution of a command can be
                 initialized by an application in the FIKS-system.

         -   System Command Performance

                 The commands to be executed are mainly:

             -   LOAD-commands
                 Those are used when modules (program, processes,
                 critical regions, etc) are to be loaded into
                 the CR80-memory.

             -   START/STOP/REMOVE process

             -   Commands concerning the disk system, e.g. FMS-user
                 ON/OFF, ASSIGN/DEASSIGN of devices, MOUNT/DEMOUNT
                 of volumes, UPDATE of volumes etc. (ref. sec.
                 4.3.4.8)

             -   Setting of system time (DTG)

             -   Watchdog commands.
         -       CLOSE/RECOVER system (ref. sec. 4.3.4.9). STANDBY
                 ON/OFF (ref. sec. 4.3.4.2)

         -   System Initialization Management. By receiving,
             interpretation and execution of commands issued
             by the Watchdog/operator the ESP is able to perform
             the different kinds of system initializations/changes
             that may be needed. (ref. 4.3.4.4 - System Initialization,
             ref. 4.3.4.8 - Switchover of branches).

         -   Background Management.
             The loading and scheduling of background tasks
             is left to the ESP. ref. sec. 4.3.4.5.

         -   System Error Handling.
             Error cases reported by the application processes
             are received by the ESP. It is then up to the ESP
             to report these to the Watchdog/system operator
             and if needed to take proper action upon the reports,
             ref. sec. 4.3.4.6.

4.3.4.4  S̲y̲s̲t̲e̲m̲ ̲I̲n̲i̲t̲i̲a̲l̲i̲z̲a̲t̲i̲o̲n̲

         When a branch has been "master cleared" the only active
         process in the CR80-computer is then FIKS BOOT LOADER.
         This is a PROM-resident program, special implemented
         to handle the security demands in the FIKS system.
         This process can, as response upon system operator
         input from the console, load a BOOT MODULE into the
         CR80 memory in both the user- and file processor. The
         boot modules contain the necessary software modules
         and configurations parameters needed to start up the
         CR80 AMOS operational system.

         In the following is listed the most important items
         in these boot modules. This will also be a list of
         which CR80 standard software modules, that is used
         in the FIKS-system.

         U̲s̲e̲r̲ ̲P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲B̲o̲o̲t̲ ̲M̲o̲d̲u̲l̲e̲

         -   CR80 AMOS MONITOR KERNEL.
             This is the lowest level of the CR80 AMOS system.
             The KERNEL implements processes, CPU management,
             inter process communication and the lowest level
             of I/O device handling, i.e. interrupt handling.
             In the FIKS system is used a version which includes
             CRITICAL REGIONS and has system data placed in
             page 1 of the CR80-memory.

         -   The ROOT-module.
             This is in the FIKS system equal to the ESP-system.

         -   Declaration of how many CPUs the system is configurated
             with. (2 CPUs with names CPU000/CPU001) and the
             time slice values used for the three possible priority
             levels the processes may have.

         -   Declaration of haw many processes, AMOS messages
             and critical regions that may exist in the system.

         -   CR80 AMOS I/O SYSTEM
             The I/O system is a program module which implements
             a set of procedures, that interfaces the user to
             the peripherals, i.e. in the FIKS system to the
             CR80 File Management System and the CR80 TDX System.

         -   CR80 DMA LINK.
             This process handles the data transfers between
             the user - and file processor. (user processor
             version).

         -   CR80 TDX-DRIVER
             The TDX-driver makes out the interface between
             the CR80 TDX HOST computer and the AMOS I/O System.
             This module is not included in the boot module
             but loaded later in the initializing phase.

         F̲i̲l̲e̲ ̲P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲B̲o̲o̲t̲ ̲M̲o̲d̲u̲l̲e̲

         -   CR80 AMOS MONITOR KERNEL.
             A version with no critical regions and which has
             system data placed in page 0 is used.

         -   ROOT-module.
             The standard AMOS ROOT module is used.

         -   Declarations concerning CPU-use.
             One CPU with name CPU000 is used.

         -   Declaration of how many processes, AMOS messages
             that may exist.

         -   CR80 FILE MANAGEMENT SYSTEM.
             This system makes out the interface between the
             I/O-system and the files placed in the disk system
             (CDE- and FLOPPY-disks).

         -   CR80 CDC DRIVER.
             This is the process that handles the interface
             to the CDC-disks.

         -   CR80 FLOPPY DRIVER
             This is the process that handles the interface
             to the FLOPPY-diskettes.

         -   CR80 DMA LINK
             This process handles the data transfers between
             the user- and file processor (file processor version).

         The boot modules are generated offline by means of
         the CR80 AMOS SYSGEN utility program.

         When the boot loading is finished, then the ESP-process
         is started and hereafter the ESP-system is responsible
         of the further initialization.

         At Node/MEDE installations the system operator has
         to tell the system which state (ACTIVE/STANDBY) the
         branch is going to be and which branch it is (ONE/TWO),
         ref. sec. 4.3.4.1.

         The system time (DTG) must always be specified. As
         it is very important, that the DTG is correct, due
         to the use as key-index in the HDB-system (ref. sec.
         4.1.4) through checking of the DTG is performed. Besides
         having correct format, it must not be less than the
         DTG of the youngest message stored on HDB. If it is
         much greater, then it could be a mistake of the operator,
         and he is warned giving him the opportunity to reset
         the DTG, before processing is started.

         The system is then initialized to be ACTIVE/STANDBY.
         This is performed by issuing of commands to the ESP.
         The commands are placed in the Job Control Files with
         names "ACTIVE/STANDBY". Those files determines how
         the FIKS CR80 software configuration is (ref. sec.
         4.3.4.3).

         The files contains commands about:

         -   Loading and creating of all critical regions.
         -   Loading and initializing of all monitor procedures.
         -   Initializing of all FIKS data-areas (MTCB-, QACCESS-
             and RDF-areas). This initialization is performed
             by special implemented processes, which only are
             present in this phase. (MTCB ̲INIT-, QACCESS ̲INIT
             and RDF ̲INIT-process).

         -   The CHECKPOINT-process (ref. sec. 4.3.4.7) is loaded.
             In the ACTIVE-file it is also started to be used
             at recovery of those data, that is independent
             of SWITCHOVER (last used page number, message number,
             etc.). This recovery is performed by the SYSCHP-process.
         -   The TDX-drivers are loaded. In the ACTIVE-file,
             they are also started and the TDX-system is initialized,
             i.e. the LTUX's is loaded with configuration parameters.
         -   Then the rest of the FIKS application processes
             are loaded. In the ACTIVE-file they are also started.

4.3.4.5  B̲a̲c̲k̲g̲r̲o̲u̲n̲d̲ ̲P̲r̲o̲c̲e̲s̲s̲i̲n̲g̲

         Some tasks in the FIKS-system may be of low priority
         and some tasks may only be activated periodicly or
         very seldom. If nothing else was done, then these tasks
         would occupy their individual part of the CR80-memory.
         They might as well take turns on using one destinat
         area and thereby share this. In this way the memory
         is much better utilized. The above mentioned scheme
         is used in the FIKS Background Management.

         O̲u̲t̲l̲i̲n̲e̲

         A fixed amount of memory has been allocated for use
         of background processing, one area for program and
         one for process. Those areas must not by any coincident
         be used of more than one background task (BGT) at one
         moment to ensure successful background processing.
         A BGT is said to be ACTIVATED/DEACTIVATED if any processing
         concerned the BGT is involved/not involved in those
         areas. A deactivated BGT (BGT-A) may then be dumped
         to disk, another deactivated BGT (BGT-B) loaded and
         activated. After a while (background processing time
         slice) BGT-B is deactivated and dumped to disk. Then
         BGT-A is loaded and activated, etc.

         Besides the BGT itself other processes may access the
         areas. Those processes will be defined as DRIVERs.
         Assuming that the concept for standard CR80 software
         drivers is used, no DRIVER access is tantamount to
         that no AMOS answer/systemanswer/path answer is awaited
         from a DRIVER (outstanding IO-requests). Refer to CR80
         AMOS Kernel, C33/302/PSP/0008.  Processes may deliver
         AMOS events to a BGT. This involve use of a message
         buffer and some "event-registers" in the BGT-process
         area. If these events are rerouted to areas outside
         the BGT-areas, while the BGT is deactivated, then the
         following simple criterias will define when a BGT is
         deactivated:

         A.  The Kernel state of the BGT is STOPPED.

         B.  The Kernel state of the BGT is GOING ̲TO ̲BE ̲STOPPED
             and some Kernel event is awaited.

         C.  There must not exist any outstanding IO-requests.

         The BGT is then DEACTIVATED if

             (A and C) or (B and C) is true.

         The Kernel-module used in the FIKS system has been
         modified to handle this scheme and to give notice to
         the ESP-process when a BGT is DEACTIVATED or it request
         processing (awaited event occur).

         I̲m̲p̲l̲e̲m̲e̲n̲t̲a̲t̲i̲o̲n̲

         The BGTs are initial loaded and started as ordinary
         processes. After a certain time slice or when the BGT-A
         gets deactivated (no processing demand) the ESP is
         notified and the next BGT-B shall be loaded and activated.
         The following processing is performed.

         -   BGT-A is STOPPED.
             (deactivation started).

         -   When BGT-A is deactivated, it is swapped out on
             a disk file. Each BGT has its own distinct area
             in this file. A disk volume with fast access is
             used for this file.

         -   BGT-B is determined by using a "Background Schedule"
             - scheme similar to the concept used in the Kernel
             multiprocess scheduling. Three priorities and one
             idle priority, used when none of the others are
             active.

         -   BGT-B is swapped in from the disk file and activated.

         -   The next event concerning background processing
             is awaited.

         L̲i̲m̲i̲t̲a̲t̲i̲o̲n̲s̲

         The design of FIKS Background Processing has been performed
         so that the designer of FIKS-applications shall not
         care too much about, whether the task should be loaded
         as background task or not. Some considerations, however,
         have to be taken into account:

         -   A BGT must not have outstanding IOs in long periods.
             I.e. when doing IO-operations, these must be expected
             to be finished within a reasonable time. (2-3 seconds
             equal to the BGT-time slice). The BGT with outstanding
             IOs will obstruct for the other BGTs. A warning
             about 'BGPS DISORDER' will appear on the system
             console as an error report. If still after three
             warnings, the IO has not yet finished, then the
             system takes it as a fatal error with involving
             of SWITCHOVER etc.

         -   The BGT must be of limited size.
             The BGT-memory areas shall be valid for all BGTs.

         -   Use of processing resources (CPU-time, IOs, etc.)
             shall be limited. The BGTs are loaded with the
             lowest Kernel-priority and shall share the resources
             with all other BGTs. A risk for bottle-necks may
             arise. An IDLE-priority to be used for tasks with
             "absolute" low priority tasks (online diagnostic
             programs) has been implemented.

4.3.4.6  E̲r̲r̲o̲r̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲

         Errors in the FIKS system can arise because of different
         reasons and be handled in different ways.

         -   Hardware errors.
             If an error occurs in a vital hardware component,
             then the only thing, that may be done to recover,
             is to switchover to use a possible dualized counterpart.
             The error may occur in a component of non vital
             importance (one terminal, one trunk). The erroneous
             component can be excluded from the system and rerouting
             of concerned message traffic performed. The system
             will still be operating but now in a reduced version.

         -   Resource errors.
             Some resources (MTCBs, Queue elements, disk files,
             etc) are shared between the applications. Caused
             by the limited number of the resources and the
             random way they are reserved by the applications,
             there is a certain probability, that a lack of
             resources arises. The way of solving this problem,
             is to wait for release of the resources, and see
             to that those already reserved, get released.

         -   Software errors.
             In the developping and debugging phase of an application,
             errors may arise due to errors in the design/code
             or incorrect configuration of the system. The cause
             shall be removed and restart of the system performed.

         The errors are sensed by processes in the FIKS-system
         and reported by use of standard CR80 system software
         procedures to the ESP. The reports will contain:

         -   name of reporting process
         -   an error code stating the cause of the error.
         -   an error label, stating the stage in the processing
             where the error occurred.

         The above mentioned items shall satisfy unique determination
         of all possible error cases. The reporting process
         is stopped when it issues a report.

         The ESP receives the report, analyses it and formats
         a report to be presented for the system operator and
         the Watchdog. They may then take action upon this.
         Based upon the analyse the ESP takes actions by itself.
         This may involve restart of the logging process, which
         then also may take some actions.

         In the following, the layout of the error report and
         the actions to be taken, is outlined.

         Error report items:

         -   Watchdog header.
             This contains information to the Watchdog about
             fatal/non fatal error in hardware equipment. The
             Watchdog decision about switchover is based upon
             this.

         -   Indication of time.
         -   Name of logging process.
         -   Error code.
         -   Error label
         -   Action

         The actions are based upon an analyse of the error
         codes and can be:

         -   IGNORED
             The error report is used to pass information to
             the system operator. This may be the result of
             an online diagnostic test, that has no influence
             in other respects. The process is restarted.

         -   LOCAL ̲FIX ̲UP.
             The error is one of those that may be recovered
             or is allowed to exist (resource errors/single
             device out of use). The ESP restarts the process
             and it is then up to the process to take further
             actions.

         -   DISCARD ̲DISK
             When using the FIKS Dual Disk system (ref. sec.
             4.3.4.8) a hardware error in one of two disk may
             occur. This can be recovered by discarding the
             erroneous disk unit. This is performed sole by
             the ESP. The discovery and reporting of the error
             is performed in such a way that is transparent
             to the process.

         -   SWITCHOVER
             A fatal error (hardware/software) has occurred
             in the CR80 computer. The Watchdog header will
             contain this information. The Watchdog will then,
             if possible, start a "Switchover of Branches",
             ref. sec. 4.3.4.9.

4.3.4.7  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲i̲n̲g̲

         A checkpoint represents a state (or part of a state)
         of some data structure in the CR80-memory, that shall
         be reestablished in connection with "Switchover of
         Branches", ref. sec. 4.3.4.9. The data structure, that
         is checkpointed and recovered in the FIKS system, and
         thereby not lost in case of fatal error in the CR80-computer,
         are:

         -   Terminal Control Blocks.
             This will ensure that all information concerning
             a terminal and its user will be available also
             after Switchover. I.e. logged on/off users will
             be logged on/off after Switchover and there will
             be no violation against the security procedures
             concerning terminal operations.

         -   Message Preparation Pool.
             If the system breaks down in the middle of a message
             preparation, the preparator do not need to key
             in the whole message once again, when Switchover
             is finished.

         -   MTCBs and Queues.
             From a message is released until it is printed
             on the receivers terminal, it is checkpointed in
             such a way, that the message will not be lost regardless
             any computer in the FIKS networks crashed, any
             point in the message processing it occur. This
             is achieved by carefully checkpointing the Message
             Control Block (MTCB), each time it indicates a
             change in the message state and checkpointing in
             which queue, the message is placed at one given
             moment.

         -   Routing Tables.
             All routing information not already kept on disk
             files, is checkpointed so that all routing/rerouting
             of message traffic is in affect even after Switchover.

         Checkpointing is performed by the application processes
         by sending an AMOS message to the Checkpoint process,
         which then format the final checkpoint and writes it
         to the disk.

         The AMOS-message contains information about what kind
         of checkpointing, that is wanted to be executed. When
         the checkpointing is finished, an AMOS answer is returned
         to the application process and it can proceed with
         the processing. In this way it is exactly controlled
         when and in which sequence the checkpointing is performed.
         Thereby inconsistance between, what is checkpointed
         and what has been processed, is avoided.

         In the restart phase the checkpoints are retrieved
         for building up the checkpointed data structures. As
         much as possible of this recovery is performed on system
         level, i.e. the application processes do not need to
         care for this processing.

         It shall be noted that the checkpointing is redundant
         if Switchover does not occur. It adds to the overhead
         processing. It is therefore desired not to checkpoint
         more than needed. Because of this not all processing
         is checkpointed. The processing/procedures that may
         be easily repeated after Switchover is not reflected
         in checkpoints. In this way a Switchover can also act
         as "clean up"-procedure. All processing not concerned
         with the data structures mentioned previous is cleared.

4.3.4.8  D̲u̲a̲l̲ ̲D̲i̲s̲k̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲s̲

         The FIKS Dual Disk Hardware configuration is outlined
         in figure 4.3.4.3-1.
         It is noted:

         -   2 disk units (DISK ONE, DISK TWO) is available.
             One in each BRANCH.

         -   both disk units can be accessed from both branches.
             I.e. a disk unit is not allocated for especially
             to be used one branch.

         -   1 floppy disk unit is available. This can only
             be accessed from BRANCH ONE.

         -   A special "File Processor" is allocated to perform
             disk operations.

         -   The File Processor is connected to the User Processor,
             where the application processes are running, via
             a DMA link.

         The idea with this configuration is to make FIKS operations
         more independent of hardware failures in the disk system.
         The two disk units are intended to be copies of each
         other. If one of them fails, then the other will still
         be present to carry out the operations, but now alone.

         To handle this design, the software design as outlined
         in the following, has been implemented. Ref. fig. 4.3.4.8-2.

         S̲t̲a̲n̲d̲a̲r̲d̲ ̲D̲i̲s̲k̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲s̲

         When an application process placed in the user processor
         requests a disk read/write operation on a logical file,
         a command concerning this is sent via the IO-system
         and DMA-driver to the File Management System (FMS)
         in the file processor. The FMS translates the command
         to disk-sector read/write commands. These are handed
         over to the disk drivers, one for each disk unit. The
         disk drivers perform the final interface to the disk
         controllers. By this disk sectors are transferred to/from
         the CR80-memory (disk cache). The data transfer between
         user and file processor is controlled by the FMS and
         executed by the DMA-drivers. When the operation is
         finished, the application is informed about completion
         via the FMS, DMA-drivers and IO-system.…86…1         …02…
          …02…   …02…   …02…            …02…

    Figure 4.3.4.8-1…01…Dual Disk Hardware Configuration

Figure 4.3.4.8-2…01…Disk Software Configuration

         D̲u̲a̲l̲ ̲D̲i̲s̲k̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲s̲

         When the Disk System status is DUAL, both disk units
         are available. Disk read operations are performed from
         one of the units while disk write operations are performed
         on both units. In case of hardware failure in one of
         the units, this one is DISCARDED. The one unit left
         is then used as single, i.e. both read/write operations.
         The Disk System status has become ONE/TWO, corresponding
         to the unit left. Later on when the erroneous disk
         is repaired or exchanged, it can be included in the
         disk system again - a DUALIZE ̲DISKS-procedure is performed.
         After START ̲DUALIZE all disk write operations are performed
         on both units. Read operations are still performed
         from the unit included in the system all the time (the
         old one). Meantime copying of all disk sectors from
         the old unit to the new take place. When the copying
         is finished, the units will be identicals. The whole
         procedure is terminated with FINISH ̲DUALIZE, after
         which the Disk Status again in DUAL. The whole procedure
         has been performed without having the Node/MEDE out
         of operation at any moment. The dualize-procedure is
         activated by starting the background task "DUALIZE
         ̲DISKS".

         Both branches can access the disks at the same time,
         but only the ACTIVE branch is allowed to (and can)
         do write operations on the disks.

         All dual disk operations are transparent to the application
         processes, i.e. they do not need to care for the disk
         status. This holds also for disk hardware failures
         that can be recovered by discarding a unit. The error
         is discovered in the IO-system, the ESP is notified
         and discards the unit, before ok completion is returned
         to the application ref. figure 4.3.4.8-2.

         The Disk Status is checkpointed (ref. sec. 4.3.4.7)
         each time a change in it occurs. The checkpoint is
         going to be used at the following procedure.

         I̲n̲i̲t̲i̲a̲l̲i̲z̲i̲n̲g̲ ̲o̲f̲ ̲t̲h̲e̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲

         When the FIKS System is bootloaded a 'hardware' Disk
         Status is achieved. This status tells which disk units
         can be accessed from a hardware point of view. The
         BRANCH/STATE is settled. On basis of this and the checkpointed
         Disk Status, the Disk Status to be used in the further
         processing is determined. The status shall be that
         used last time the system was ACTIVE, or at least not
         indicate use of a disk unit not in use last time. This
         is assured by using the largest common Disk Status
         of all the three possible disk status, one checkpointed
         from each disk unit and one hardware disk status. The
         disk status is then checkpointed. If the branch is
         going to be ACTIVE then allowance to do disk writing
         is given.

4.3.4.9  S̲w̲i̲t̲c̲h̲o̲v̲e̲r̲ ̲o̲f̲ ̲B̲r̲a̲n̲c̲h̲e̲s̲

         Suppose that an entire dualized Node/MEDE configuration
         as in figure 4.3.4.1 exist. One branch (BRANCH ONE)
         is ACTIVE and the other branch (BRANCH TWO) is STANDBY.
         The ACTIVE branch is performing all the operations
         and the STANDBY branch is as ready as possible to take
         over the operations. A fatal error occurs in the ACTIVE
         branch. The STANDBY branch is then to take over the
         operations. This happens as follows:

         -   The error is sensed by the Watchdog, either by
             missing answers upon the 'Alive Status Report'
             (ref. sec. 4.3.4.2) to the ESP in the ACTIVE branch
             or at reception of a fatal error report (ref.sec.
             4.3.4.6) from the ACTIVE branch.
             The Watchdog knows then it has to start the Switchover
             procedrue.

         -   The system operator is asked if 'SWITCHOVER FROM
             P1 TO P2 ALLOWED". The operator has 10 seconds
             to decide. If he answers 'NO', then all further
             processing is cancelled. If 'YES' or the time expires,
             then the Switchover proceeds.

         -   Before the STANDBY branch can take over, possible
             ongoing processing in the ACTIVE branch has to
             be stopped. If both branches were executing active
             operations at the same moment, then this would
             cause severe confusion in the disk/TDX-system.
             The Watchdog issues a CLOSE-command to the ACTIVE
             branch and waits upon completion of this command.

         -   The ESP receives the command. It is very important
             that CLOSE of the system is performed in a proper
             manner. It is especially important, that the processing
             concerning accessing of the disk system is terminated
             in such a way, that the logical coherence is kept.
             All application processes are terminated so that
             the disk files being accessed is dismantled correctly.
             This cleanup procedure is accomplished by use of
             the Job Control File 'CLOSE'. When finished, the
             ESP will take care that the disks/TDX-hosts used
             is released. Completion about execution of the
             CLOSE-command is returned to the Watchdog.

         -   When the Watchdog receives CLOSE-completion or
             the waiting upon expires, the Watchdog will MASTERCLEAR
             the former ACTIVE branch. In this way it is assured,
             that the branch will not in any way interfere with
             the processing in the coming ACTIVE branch. The
             Watchdog will then issue a Recover-command to the
             STANDBY branch.

         -   The ESP receives the command and starts the RESTART-procedure.
             First the disk system is initialized (ref. sec.
             4.3.4.9). This will ensure that the same disk configuration
             as used by the former ACTIVE branch also will be
             used of the coming ACTIVE branch. A global semaphare,
             telling all the application processes, that they
             have to do their part of the RESTART-procedures,
             is set.

         -   The RECOVER of those CR80-memory data structures,
             that can be handled on system level, is now performed.
             To do that, the checkpoints stored by the former
             ACTIVE branch is used. Special applications have
             been developped for this purpose. The SYSCHP-process
             recovers 'Terminal Control Blocks' and 'Message
             Preparation Pool', the RECOVM-process recovers
             the MTCBs and Queues and the RECMES-process recovers
             the messages that was being in preparation at the
             moment of Swithchover (ref. sec. 4.3.4.7). The
             task of resetting all the Message Preparation Files
             not in use is left to the RESPDB-process.

         -   The TDX-system is initialized once again. All the
             application processes, loaded earlier in the STANDBY
             initializing phase, are started. They will then
             do their part of the RECOVER-procedure.

         -   The execution of the total RESTART-procedure is
             controlled by using the Job Control File 'RECOVER'.
             The whole procedure takes about 2 minutes. This
             is the time the Node/MEDE will be out of operation
             in case of fatal errors. When the procedure is
             finished a message will be sent to the SCC, telling
             that 'Switchover' has occurred. The Watchdog is
             told that the branch is now ACTIVE.

4.3.4.10 S̲y̲s̲t̲e̲m̲ ̲U̲t̲i̲l̲i̲t̲i̲e̲s̲

         The FIKS operating system (the ESP) offers a wide range
         of utilities to the staff (system operator, technicians,
         programmer and designer), that maintains the system:

         -   Printout of various system states:

             -   The status of the processes.
             -   The disk system status, no. of read/write operations
                 performed/failed, etc.
             -   The TDX-system status, no. of transmissions/retransmissions/errors,
                 etc.
             -   The file processor status.
             -   The status of the critical regions:

             The printouts are meant to be used for diagnostics,
             resource consumption investigations, debugging,
             etc.

         -   Error dumps.
             At error cases, the whole CR80-memory or part of
             it can be dumped to disk files, either by intervention
             from a system operator or in an automatic way,
             initialized by a process error report.

         -   Floppy utilities.
             The error dumps performed can be retrieved from
             the disk files to floppy files while the FIKS system
             is operating online. Minor updates of the software
             configurations may also be executed online. The
             software modules to be updated is loaded from floppy
             to the disk system and a SWITCHOVER-procedure is
             performed. After this the updated modules are included
             in the operating system.

         -   Online inspection utilities.
             The CR80-memory and the disk files can be inspected
             online, while the FIKS system is running. This
             gives very good opportunities in the debugugging
             and analyzing phase of error cases. Those utilities
             are also meant to be used in test procedures of
             updated/new software modules in the FIKS system.

             The use of some of the utilities has a security
             aspect. Therefore these utilities can not be activated
             unless a PASSWORD-procedure has been passed.

4.3.4.11 S̲y̲s̲t̲e̲m̲ ̲O̲f̲f̲l̲i̲n̲e̲ ̲S̲t̲a̲t̲u̲s̲

         The maintenance of the FIKS software system is in principle
         performed in an offline mode using the standard CR80
         AMOS Terminal Operating System (TOS). The TOS gives
         access to all needed functions

         -   generations of programs
         -   editing of files
         -   patching of files
         -   copying of files
         -   disk system recovery
         -   etc.

         The TOS-use is carried out by BOOT-loading the CR80-computer
         with especial boot modules.

         Other offline states of the CR80-computer, also started
         by using a special boot-module, may exist.

         -   AMOS Master Clear Utilities.
             (low level CR80-computer testing and analyzing).
         -   Disk Test Utilities
             (Especial FIKS-application)
         -   Backup of disk volumes.

         Uncontrolled access to start up of these offline states
         would cause security problems. Therefore the boot modules,
         which are the key to the states, can not normally be
         used. They will have to be enabled. This can only be
         done in a FIKS-operating system by a system operator,
         that has passed a PASSWORD-procedure.

DataMuseum.dk

CR80 Wang WCS documentation floppies

⟦af2072e5f⟧ Wang Wps File

Derivation

WangText