⟦50bdda95b⟧

WangText

…17……0b……17……02……17……07……16……0e……86…1






     …02…

    …02…
…02…


…02…CPS/SDS/001

…02…BBC/811020…02……02…
CAMPS
SYSTEM
DESIGN
SPECIFICATION
…02…ISSUE
1.1…02…CAMPS

                 T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲

   4.7   RECOVERY .............................. 255
     4.7.1 Requirements and General Concepts ... 255
       4.7.1.1 Queue Control at Restart ........ 255
       4.7.1.2 Recovery Situations ............. 255
       4.7.1.3 Recovery/Restart from Total
               System Failure .................. 256
       4.7.1.4 Recovery after Switch-over ...... 257
       4.7.1.5 Withdrawal of Redundant Items ... 257

     4.7.2 Failure Types ....................... 258
       4.7.2.1 Minor Errors .................... 258
         4.7.2.1.1 Process Failure ............. 258
         4.7.2.1.2 Process Detected Failure .... 258

       4.7.2.2 Single System Failures .......... 259
       4.7.2.3 Total System Failures ........... 259
       4.7.2.4 Software System Errors .......... 259
       4.7.2.5 Disastrous Errors ............... 259

     4.7.3 Recovery Actions .................... 259

     4.7.4 Central Recovery Functions .......... 260
       4.7.4.1 Checkpoint Function ............. 260
       4.7.4.2 Consistent Checkpoints .......... 261
       4.7.4.3 Checkpoint Generation ........... 261
       4.7.4.4 Checkpoint Reception ............ 261
       4.7.4.5 Checkpoint Restore .............. 261
       4.7.4.6 Standby PU Coordination ......... 261
       4.7.4.7 Queue Control at Restart ........ 262
       4.7.4.8 Save System Data ................ 262
       4.7.4.9 Restore System Data ............. 262
       4.7.4.10  Disk System Integrity ......... 262

     4.7.5 Recovery Level ...................... 262
       4.7.5.1 Checkpoints ..................... 262
         4.7.5 1 1 Checkpoint 1 ................ 264
         4.7.5.1.2 Checkpoint 2 ................ 264
         4.7.5.1.3 Checkpoint 3 ................ 264
         4.7.5.1.4 Checkpoint 4 ................ 264
         4.7.5.1.5 Checkpoint 5 ................ 265

         4.7.5.1.6 Checkpoint 6 ................ 265
         4.7.5.1.7 Checkpoint 7 ................ 265
         4.7.5.1.8 Checkpoint 8 ................ 265
         4.7.5.1.9 Checkpoint 9 ................ 266
         4.7.5.1.10  Checkpoint 10 ............. 266
         4.7.5.1.11  Checkpoint 11 ............. 266
         4.7.5.1.12  Checkpoint 12 ............. 266
         4.7.5.1.13  Checkpoint 13 ............. 266
         4.7.5.1.14  Checkpoint 14 ............. 267
         4.7.5.1.15  Checkpoint 15 ............. 267
         4.7.5.1.16  Checkpoint 16 ............. 267

       4.7.5.2 Level 0 ......................... 268
       4.7.5.3 Level 1 (Restart) ............... 268
       4.7.5.4 Level 2 (Switchover) ............ 268
       4.7.5.5 Level 3 (Restart after Closedown) 269

     4.7.6 Application Recovery Actions ........ 269
       4.7.6.1 Traffic Handling & Distribution . 269
       4.7.6.2 Terminal System ................. 270
         4.7.6.2.1 User Functions .............. 270
         4.7.6.2.2 Supervisor Functions ........ 272
         4.7.6.2.3 Message Service Function .... 272

       4.7.6.3 Storage & Retrieval ............. 272
       4.7.6.4 Log & Accountability ............ 273
       4.7.6.5 Statistics ...................... 273

     4.7.7 Performance & Capacity .............. 274
       4.7.7.1 Checkpoint Rate ................. 274
       4.7.7.2 Disk Space Requirements ......... 275
       4.7.7.3 Switch Time ..................... 275
       4.7.7.4 Restart Time after Total System
               Failure ......................... 275
               Standby PU Restart Time ......... 275

     Fig. 4.7.5.1  CAMPS System Msg. Flow I .... 263
     Fig. 4.7.7.l-l  CAMPS System Msg. Flow II . 276
     Fig. 4.7.7.1-2  CAMPS System Msg. Flow III  277

4.7      R̲E̲C̲O̲V̲E̲R̲Y̲

4.7.1    R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲ ̲a̲n̲d̲ ̲G̲e̲n̲e̲r̲a̲l̲ ̲C̲o̲n̲c̲e̲p̲t̲s̲

         The activities performed by the system following a
         failure is termed "recovery".  The requirements are
         as follows:

4.7.1.1  Q̲u̲e̲u̲e̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲a̲t̲ ̲R̲e̲s̲t̲a̲r̲t̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲4̲.̲1̲.̲1̲.̲1̲4̲)̲

…02…a)       During restart, commands shall exist to cancel messages
         and transactions as specified below.

             1)  For each outgoing channel cancel all messages
                 of specified precedence.

             2)  For each reception position cancel all output
                 of specified precedence.

             3)  For each release position cancel all messages
                 of specified precedence awaiting release.

             4)  For each preparation position cancel all suspended
                 transactions.

             5)  Cancel all messages of specified precedence
                 awaiting MDCO processing.

             6)  Cancel all messages of specified precedence
                 awaiting message service.

         b)  Messages, for which queue elements have been cancelled
             as specified above will still be available for
             retrieval to the extent specified in section 3.2.7.

4.7.1.2  R̲e̲c̲o̲v̲e̲r̲y̲ ̲S̲i̲t̲u̲a̲t̲i̲o̲n̲s̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲8̲.̲4̲)̲ ̲

         Section 3.2.8.4 describes detailed recovery requirements
         in the following situations:

         -   total system failure
         -   failure requiring a switchover

4.7.1.3  R̲e̲c̲o̲v̲e̲r̲y̲/̲R̲e̲s̲t̲a̲r̲t̲ ̲f̲r̲o̲m̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲
         3̲.̲2̲.̲8̲.̲4̲.̲1̲

         a)  The system shall preserve essential data enabling
             recovery in case of total system failure.

         b)  A total system failure may be caused by:

             1)  simultaneous error in redundant equipment
             2)  error in main power supply
             3)  detection of a program error, which cannot
                 be recovered internally.
         c)  No message must be lost within the network e.g.

             1)  messages, which were only partially transmitted
                 or for which FLASH acknowledgement have not
                 been received prior to the error, shall be
                 retransmitted.

             2)  continuity of the reporting system shall be
                 maintained to ensure, that messages are not
                 lost within the network as a consequence of
                 the loss of either a report or an automatically
                 generated service message.

         d)  Continuity shall be maintained to the time defined
             by the last storage with respect to the following
             data items:

             1)  log records
             2)  statistics
             3)  messages/comments in the HDB
             4)  current system parameter file
             5)  message accounting

         e)  The data (defined above) stored prior to the total
             system failure can be recalled by the use of normal
             commands.

         f)  Integrity of memory shall be maintained to ensure
             that the system will not fail as a consequence
             of operating upon corrupted data generated by a
             preceding error.

         g)  Restart subsequent to recovery shall be based upon
             the system parameter file obtained in 3.2.8.1(e).

         h)  Certain errors may preclude total recovery. In
             these cases the system is recovered so that the
             chance of losing a message is minimized.

         i)  Recovery and restart subsequent to a total system/restart
             failure will be finalized within 15 minutes.

4.7.1.4  R̲e̲c̲o̲v̲e̲r̲y̲ ̲a̲f̲t̲e̲r̲ ̲S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲8̲.̲4̲.̲2̲)̲

         a)  Where redundant equipment is provided to meet the
             availability/reliability requirements, the switch
             over from an active configuration to a standby
             configuration shall not result in loss of messages,
             i.e

             1)  The communication protocols shall ensure retransmission
                 of messages being garbled or partly lost.

             2)  The accountability log shall ensure that the
                 accountability related to messages is not lost

         b)  Where a transaction at a terminal is in progress
             the user shall be informed of the state to which
             the transaction has been recovered after switch
             over.

         c)  A switch over initiated by the detection of an
             error, shall normally not require operator intervention.

4.7.1.5  W̲i̲t̲h̲d̲r̲a̲w̲a̲l̲ ̲o̲f̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲t̲ ̲I̲t̲e̲m̲s̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲4̲.̲4̲.̲5̲.̲1̲)̲

         -   The capability is maintained to re-integrate such
             redundant items into the operational configuration
             and return them to service within 5 minutes.

         -   Recovery procedures after system failure shall
             include checksumming of the operating system software,
             reloading if this is necessary. (ref.: SRS 3.4.5.7).

         To meet these requirements, maximize availability and
         minimize inconvenience to the users, an event based
         checkpoint method is used.  Restart points are defined
         in the message flow to meet these objectives and information
         gathers when messages pass these points.

         The number of checkpoints and the placement are determined
         taking into account the following factors:

         -   Availability
         -   User inconvenience
         -   System overhead
         -   System complexity

         The actual method employed is that checkpoint information,
         consisting partly of message control blocks (MCB's)
         and partly of file directory information, is transferred
         to the stand-by PU for each point passed by the message.
          At certain points the information is recorded on disk
         as well.

         This means that several levels of recovery can be employed
         depending upon the type of failure and the availability
         of stand-by PU and other redundant equipment.

4.7.2    F̲a̲i̲l̲u̲r̲e̲ ̲T̲y̲p̲e̲s̲

4.7.2.1  M̲i̲n̲o̲r̲ ̲E̲r̲r̲o̲r̲s̲

         These are defined as errors that can be recovered from
         without switch-over to the stand-by PU and without
         system reload.

4.7.2.1.1    P̲r̲o̲c̲e̲s̲s̲ ̲F̲a̲i̲l̲u̲r̲e̲

         -   Illegal operation
         -   Security violation attempt
         -   etc.

4.7.2.1.2    P̲r̲o̲c̲e̲s̲s̲ ̲D̲e̲t̲e̲c̲t̲e̲d̲ ̲F̲a̲i̲l̲u̲r̲e̲

         -   HW error (other than PU failure)
         -   Certain types of file corruption
         -   Resource error
         -   etc.

4.7.2.2  S̲i̲n̲g̲l̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲

         These are defined as HW-errors limited to one PU thus
         recoverable by switch-over to stand-by:

         -   Any active PU failure.

4.7.2.3  T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲

         These are defined as malfunction of the total dualized
         configuration:

         -   Simultaneous failure of dualized equipment
         -   Single PU failure when stand-by PU not available
         -   Power failure

4.7.2.4  S̲o̲f̲t̲w̲a̲r̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲s̲

         These are defined as programming errors (other than
         minor errors) resulting in system break and recoverable
         by system software reload and restart.

4.7.2.5  D̲i̲s̲a̲s̲t̲r̲o̲u̲s̲ ̲E̲r̲r̲o̲r̲s̲

         These are defined as errors resulting in loss of vital
         system files and thus not completely recoverable with
         respect to:

         -   Messages
         -   Accountability
         -   Log
         -   Statistics
         -   System parameters

4.7.3    R̲e̲c̲o̲v̲e̲r̲y̲ ̲A̲c̲t̲i̲o̲n̲s̲

         1̲.̲  E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲u̲p̲

         This action is applicable to recovery from minor errors
         ref. section 4.11.

         2̲.̲  S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲t̲o̲ ̲S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲

         This action is taken upon detection of a fault in the
         active PU when the stand-by PU is available.

         3̲.̲  R̲e̲s̲t̲a̲r̲t̲ ̲o̲f̲ ̲S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲

         This action is required when a PU is returned to the
         system (after repair or off-line operation) for use
         as a stand-by PU.

         4̲.̲  R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲C̲l̲o̲s̲e̲ ̲D̲o̲w̲n̲

         This action is taken when the system is brought back
         into operation following an ordered close down.

         5̲.̲  T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲R̲e̲l̲o̲a̲d̲ ̲

         This action will be taken upon failure of the total
         system due to HW or SW, after the error condition is
         detected and removed.

         6̲.̲  I̲n̲i̲t̲i̲a̲l̲ ̲S̲t̲a̲r̲t̲-̲u̲p̲

         This action is taken at first time load of the system
         (following installation) and is required to bring the
         system back to operational use following a disastrous
         error.

4.7.4    C̲e̲n̲t̲r̲a̲l̲ ̲R̲e̲c̲o̲v̲e̲r̲y̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲

4.7.4.1  C̲h̲e̲c̲k̲-̲p̲o̲i̲n̲t̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲

         As checkpointing is event based, that is that checkpoints
         are defined as points in the message propagation through
         the system where a transition from one stage to the
         next takes place, the checkpoint function is required
         to record, at these points, information about the message
         that is necessary to recover it to the last state where
         it was checkpointed.

         As the message itself is stored on disk, the information
         required to recover it, is the M̲essage C̲ontrol B̲lock
         (MCB) identifying it and the state of it, plus index
         information pointing to where the message is stored.

         The recovery functions are:

4.7.4.2  C̲o̲n̲s̲i̲s̲t̲e̲n̲t̲ ̲C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲s̲

         Central system functions ensuring that update of MSG,
         queues, tables, files etc. is done in a way that ensures
         that related updates are either all performed or none
         of them performed.

4.7.4.3  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲G̲e̲n̲e̲r̲a̲t̲i̲o̲n̲

         Central system function for generation of checkpoint
         data transparent to the individual sub-systems and
         transferring these data to disk and/or standby PU.

4.7.4.4  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲e̲c̲e̲p̲t̲i̲o̲n̲

         Central system function (SS&C) for receiving checkpoint
         data from the active PU and maintaining a pool of "active
         checkpoints".

4.7.4.5  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲e̲s̲t̲o̲r̲e̲

         Central system function to enable regeneration of system
         state from checkpoint data previously generated.  These
         data may come from disk in the case of warm start or
         from st. b. PU in the case of switch-over.

4.7.4.6  S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲ ̲C̲o̲o̲r̲d̲i̲n̲a̲t̲i̲o̲n̲

         Central System Functions (SS&C) for re-establishing
         operation of stand-by PU by transferring programs,
         tables, index, saved checkpoint data etc. to it.

4.7.4.7  Q̲u̲e̲u̲e̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲a̲t̲ ̲R̲e̲s̲t̲a̲r̲t̲

         Central System Functions to enable the required manipulation
         of queues after restart and before the system becomes
         fully operational.

4.7.4.8  S̲a̲v̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲D̲a̲t̲a̲

         Central System Functions for storing of queues, tables,
         index etc. on disk prior to ordered close down.

4.7.4.9  R̲e̲s̲t̲o̲r̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲D̲a̲t̲a̲

         Central System Functions for re-establishing queues,
         tables, index etc. in main memory during restart after
         ordered close down.

4.7.4.10 D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲I̲n̲t̲e̲g̲r̲i̲t̲y̲

         The mirrored disk concept prevents "hard" faults as
         well as "soft" faults and thus obviates a special "safe
         update" mechanism.

4.7.5    R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲

         The level of recovery is defined by the checkpoints
         and depends upon the type of failure.

4.7.5.1  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲s̲

         The points in the message flow through the system that
         defines completion of a defined task are called checkpoints
         and described individually in this section.  Ref. Fig.
         4.7.5.1.

Fig. 4.7.5.1…01…CAMPS SYSTEM MSG FLOW…01…I

4.7.5.1.1    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲

         Incoming messages are stored, sent to the IMQ and checkpointed.
          After analysis and possibly Garble Correction, they
         are sent to MDQ for "normal" messages, and Service
         MSG to SVQ.  Recovery at this stage will reestablish
         the IMQ and thus require actions prior to next checkpoint
         to be repeated.

         In the case of discontinuity in channel seq-no supervisor
         action is invoked (for sending out service message
         etc).

         In the case of a flash message, the flash message receipt
         will be sent to the OMQ (and thus checkpointed) before
         the message itself is sent to the MDQ and checkpointed.

4.7.5.1.2    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲2̲

         All messages, comments and release notifications sent
         for distribution will go into the MDQ and be checkpointed.
          They will be distributed to all recipients possibly
         with assistance from MDCO.

4.7.5.1.3    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲3̲

         All messages, comments and release notifications that
         have passed distribution ( with or without MDCO interaction)
         are checkpointed avoiding repeated distribution in
         case of recovery.

         Output to PTP is checkpointed.

4.7.5.1.4    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲4̲

         All messages including those presented for coordinators,
         comments, and release notifications presented at a
         terminal are checkpointed.  This means that checkpointing
         is performed on a "presented per terminal" -basis avoiding
         re-presentation after recovery.

         This applies to printing where applicable.

         Output to PTP is checkpointed.

4.7.5.1.5    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲5̲

         Messages and comments created during Initial Preparation
         are checkpointed and recoverable to the last segment
         stored on disk.  After completion of that phase (with
         possible correction) the message is completely recoverable
         in this "initial version".  After completion of editing,
         a "current version" exists and the corresponding checkpointing
         is done.  After that, this version is recoverable.

         Input from OCR/PTR is recoverable when complete.

4.7.5.1.6    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲6̲

         Messages sent for release are sent to the RLQ and checkpointed.
          When the messages are released, rejected or deferred,
         they are removed from RLQ and thus prior to that recovered
         in the RLQ.

4.7.5.1.7    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲7̲

         Messages released are sent to the MRQ for routing and
         checkpointed.  They are thus recovered there.  The
         Release notification (with possible comment) is sent
         to the MDQ and checkpointed there.

4.7.5.1.8    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲8̲

         Messages that have passed routing (possibly with Routing
         assistance or Group Count assignment) are sent to the
         OMQ for transmission or MDQ for local distribution
         and checkpointed.

4.7.5.1.9    C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲9̲

         When messages are transmitted a check point meaning
         "message finished" is sent. Messages for which an acknowledgement
         is expected will remain in the OMQ until reception
         of the acknowledgement, then above action will be performed.
          In case of a channel failure preventing the message
         to be sent, it will be returned to the MRQ and checkpointed.

         All messages recovered in the OMQ will be retransmitted.

4.7.5.1.10   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲0̲

         Messages rejected or deferred are removed from the
         RLQ and together with the notification (with comment)
         sent to the MDQ.  They are checkpointed together in
         an indivisible operation.

4.7.5.1.11   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲1̲

         Messages directed to supervisor or service messages
         are sent to the SVQ and checkpointed.  Prior to that,
         they will be recovered from previous checkpoint.

4.7.5.1.12   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲2̲

         Messages presented for the supervisor are checkpointed
         as finished.  Prior to that they are recovered in the
         SVQ and represented.

4.7.5.1.13   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲3̲

         Messages that message service decides shall be stopped
         are checkpointed as finished.

4.7.5.1.14   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲4̲

         Messages that MDCO decides shall be stopped are checkpointed
         as finished.

4.7.5.1.15   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲5̲

         Messages from CCIS sent for release, rejected or deferred
         by the releasing officer are checkpointed as finished.
          The associated comment is sent to MRQ and checkpointed.

4.7.5.1.16   C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲6̲

         Messages prepared by the supervisor are treated as
         described for point 5.

4.7.5.2  L̲e̲v̲e̲l̲ ̲0̲

         This is the state resulting from cold start.  The system
         is empty (except for possibly the HDB) which means
         that no processing will take place before external
         activity starts (incoming messages, sign-in, retrieval
         etc.).

4.7.5.3  L̲e̲v̲e̲l̲ ̲1̲ ̲(̲R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲)̲

         This is the state resulting from total system failure
         and recovery is based upon data on disk.  Disk system
         integrity checks on vital data are performed to ensure
         validity.  Then system data are reloaded and checkpoint
         data read from disk and used to restore queues, directories,
         status information etc.  The checkpoint data written
         to disk are from points 1, 5, 16 where msg are "created"
         and points 4, 9, 12, 13, 14, 15 where msg are "finished".
          Point 4 only when distribution is fully completed.
          Point 5 only when preparation completed.  Status reports
         generated and not yet presented will be lost, as the
         content of them is likely to be out of date.  They
         can then be requested by the users and they will cover
         the time interval set in system parameters (by supervisor).

         Global No Series r̲e̲l̲a̲t̲e̲d̲ ̲t̲o̲ ̲m̲s̲g̲ will be recovered.

4.7.5.4  L̲e̲v̲e̲l̲ ̲2̲ ̲(̲S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲)̲

         This is the state following switch-over to the standby
         PU.  No disk system integrity checking is required
         as the mirrored pair of disks are identical.

         Recovery is based upon the checkpoint data collected
         in the standby PU prior to switch-over and thus the
         recovery level is to the last checkpoint.

         Status reports generated but not yet presented are
         lost.

         Global No Series r̲e̲l̲a̲t̲e̲d̲ ̲t̲o̲ ̲m̲s̲g̲ will be recovered.

4.7.5.5  L̲e̲v̲e̲l̲ ̲3̲ ̲(̲R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲C̲l̲o̲s̲e̲ ̲D̲o̲w̲n̲)̲

         This is the state resulting from ordered close down.
          As the close down is performed in such a way, that
         all activities are terminated in a well defined state
         and all queue, message, index etc. information is saved
         on disk, the recovery can be accomplished without repetition
         of manual work and discontinuities in data.  Ref. description
         of SS&C section 5.10.

4.7.6    A̲p̲p̲l̲i̲c̲a̲t̲i̲o̲n̲s̲ ̲R̲e̲c̲o̲v̲e̲r̲y̲ ̲A̲c̲t̲i̲o̲n̲s̲

         The principles behind checkpoint/recovery make it possible
         to keep the "local" sub-system recovery actions to
         a minimum.  The applications recovery actions can be
         divided into implicit and explicit actions.  The implicit
         "actions" are merely a set of restrictions which the
         application should adhere to in order to ensure system
         integrity and recoverability to last checkpoint.  As
         this has to do with the way central functions are used,
         the rules can be found in the sections describing these.

         The explicit actions are those required to "clean up"
         before restart of the operation.  Again most "clean
         up" is performed automatically by central system functions,
         but in certain cases, some actions are most conveniently
         done by the applications.  Such actions are described
         in this section.

4.7.6.1  T̲r̲a̲f̲f̲i̲c̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲ ̲&̲ ̲D̲i̲s̲t̲r̲i̲b̲u̲t̲i̲o̲n̲

         a)  A̲n̲a̲l̲y̲s̲i̲s̲

             Incoming messages checkpointed last at point 1
             will be recovered in IMQ.  To ensure that a flash
             message receipt is not "forgotten", it is sent
             to OMQ point 8 before passed on to MDQ point 2.
              In the case, where the flash message is recovered
             in IMQ and the flash message receipt recovered
             in OMQ, a new flash message receipt will be generated
             and thus t̲w̲o̲ such messages sent for the same flash
             message, the latter of the two will be stamped
             suspected duplicate. This also means that the analysis
             module must be prepared to cope with receiving
             two such messages from another site.

             Further, provided that the flash message receipt
             reaches the sender of the flash message too late
             (due to re-start), the analysis module must be
             prepared to receive the same message twice.

         b)  D̲i̲s̲t̲r̲i̲b̲u̲t̲i̲o̲n̲

             N/A.

         c)  R̲o̲u̲t̲i̲n̲g̲

             N/A.

         d)  T̲r̲a̲n̲s̲m̲i̲s̲s̲i̲o̲n̲/̲F̲i̲n̲i̲s̲h̲

             All messages are recovered in the OMQ and will
             be retransmitted.  They will be "stamped" "suspected
             duplicate" where applicable and in case of a flash
             message that is repeated, the flash message receipt
             will be duplicated, too.

4.7.6.2  T̲e̲r̲m̲i̲n̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲

         Generally, the functions in the terminal system are
         recovered to an extent minimizing user inconvenience.
          This means that formats requiring extensive data entry
         (prep. etc.) are recovered per segment saved on disk

         (where saved means stored on disk and checkpointed
         with respect to the level of recovery applicable) and
         others recovered when complete.

4.7.6.2.1    U̲s̲e̲r̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲

         a)  P̲r̲e̲p̲a̲r̲a̲t̲i̲o̲n̲

             Preparation of messages and comments is recovered
             to last saved segment on disk.

             Input completely received from OCR/PTR is recovered
             from disk.

         b)  P̲r̲e̲s̲e̲n̲t̲a̲t̲i̲o̲n̲

             Presentation (display/print) in progress will be
             restarted from beginning, either on a per terminal
             basis/recovery level 2, or per document basis/recovery
             level 1.

             Output to PTP will be restarted from beginning
             preceded by blank tape.

         c)  R̲e̲t̲r̲i̲e̲v̲a̲l̲

             As for presentation.

         d)  D̲e̲l̲e̲t̲i̲o̲n̲

             Deletion requests deferred to supervisor are recovered
             in the SVQ.  Otherwise, the request will have to
             be repeated, if not complete.

         e)  S̲i̲g̲n̲ ̲I̲n̲/̲O̲u̲t̲ ̲-̲ ̲S̲e̲c̲u̲r̲i̲t̲y̲

             For security reasons terminals will be signed off
             during recovery.  Thus sign-in and security interrogation/warning
             will be repeated.

         f)  S̲t̲a̲t̲u̲s̲

             Outgoing Message Status, Delivery Status and Message
             Release Status requests will have to be repeated
             as the content is likely to be out of date.  Where
             status information is automatically output at time
             intervals preset by supervisor this will be lost,
             but status requests issued until next automatic
             output will cover the preset time interval.

         g)  R̲e̲l̲e̲a̲s̲e̲

             Messages released will be recovered in MRQ and
             release notification (possibly with comment) sent
             to drafter recovered there.

             Messages rejected or deferred are sent to MDQ together
             with release notification and recovered there.

4.7.6.2.2    S̲u̲p̲e̲r̲v̲i̲s̲o̲r̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲

         All functions accepted by the system are recoverable.
          Prior to that, they will have to be repeated.

         Preparation of service messages is treated as user
         preparation with respect to recovery.

4.7.6.2.3    M̲e̲s̲s̲a̲g̲e̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲

         a)  G̲a̲r̲b̲l̲e̲ ̲C̲o̲r̲r̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲G̲r̲o̲u̲p̲ ̲C̲o̲u̲n̲t̲ ̲A̲s̲s̲i̲g̲n̲m̲e̲n̲t̲

             Prior to completion, the message is recovered in
             IMQ (this is the version input to Msg. Serv.) and
             upon completion returned to IMQ and recoverable
             there.

         b)  R̲I̲-̲A̲s̲s̲i̲g̲n̲m̲e̲n̲t̲

             Prior to completion, the message is recovered in
             MRQ (this is the version input to Msg. Serv.) and
             upon completion returned to MRQ and recoverable
             there.

         c)  O̲C̲R̲ ̲I̲n̲p̲u̲t̲

             OCR input directed to msg. serv. will be checkpointed
             in point 5 and prior to purging recoverable there.

4.7.6.3  S̲t̲o̲r̲a̲g̲e̲ ̲&̲ ̲R̲e̲t̲r̲i̲e̲v̲a̲l̲ ̲(̲S̲A̲R̲)̲

         a)  Retrievals in progress will be lost and have to
             be repeated by the user and thus no recovery action
             is required by SAR.

         b)  Storage of items in Intermediate Storage is automatically
             recovered as far as SAR is concerned because of
             checkpointing.

         c)  Dump of items from Intermediate Storage to Long
             Term Storage is implemented in 5 steps:

             1)  "Dump Items" command from SAR to CSF.

             2)  "Dump complete" response from CSF to SAR.

             3)  Update of Long Term Storage Catalogue.

             4)  Update of Intermediate Storage Catalogue.

             5)  "Remove dumped items" command from SAR to CSF

             This ensures that a failure will never result in
             inconsistency between storage and catalogue and
             thus no loss of retrievable items can occur.

4.7.6.4  L̲o̲g̲ ̲&̲ ̲A̲c̲c̲o̲u̲n̲t̲a̲b̲i̲l̲i̲t̲y̲

         Log records are stored on disk per 5 records generated
         or 30 seconds elapsed.  Log records are sent to the
         standby PU so that complete recovery of log records
         can be achieved in the case of Switch-over.  Log records
         are stored on disk in items (by the Message management
         system) and thus recovery to the last Storage can be
         achieved.

         In the case of switch-over, the logging system will
         have checkpointed log records distributed to it and
         will be responsible for completing the log file.

         When recovery necessitates repetition of an event to
         be logged multiple log records for the same event can
         occur.  Such records will be marked to reflect this.

4.7.6.5  S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲

         Statistics are required to be recovered to the last
         storage on disk and thus this storage becomes the recovery
         point.

4.7.7    P̲e̲r̲f̲o̲r̲m̲a̲n̲c̲e̲ ̲&̲ ̲C̲a̲p̲a̲c̲i̲t̲y̲

4.7.7.1  C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲a̲t̲e̲

         The calculation of the checkpoint rate required is
         based upon average busy minute throughput (30 msg/min
         incoming, 6 msg/min outgoing) apart from message presentation
         where average busy hour throughput 10 msg/min incoming
         is more appropriate.  The calculation is only a gross
         sizing which takes into account only main paths (occurrences
         contributing less than 20% are not taken into consideration).

         a)  I̲n̲c̲o̲m̲i̲n̲g̲ ̲M̲S̲G̲ ̲M̲a̲i̲n̲ ̲P̲a̲t̲h̲

             1)  R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲1̲

                 The msg will pass points 1. and 4. once:

                 CHKPT-IRATE = 1 x 0.5 + 1. x 0.166 = 0̲.̲6̲6̲6̲/̲s̲e̲c̲.̲

             2)  R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲2̲

                 The msg will pass points 1., 2., 3. once and
                 4. 15 times for display + 15 times for print.

                 CHKPT-IRATE = 3 x 0.5 + 30 x 0.166 = 6̲.̲5̲/̲s̲e̲c̲.̲

         b)  O̲u̲t̲g̲o̲i̲n̲g̲ ̲M̲S̲G̲ ̲M̲a̲i̲n̲ ̲P̲a̲t̲h̲

             1)  R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲1̲

                 The msg will pass point 5. 3 times (prep +
                 2 edit) point 9. once, point 4. once (coordination),
                 point 4. once (rel. not + comment), point 4.
                 (local dist.) once, generates 2 comments i.e.
                 passes points 5., 4. twice:

                 CHKPT-ORATE = (3 + 1 + 1 + 1 + 1 + 2 x 2 )
                 x 0.1 = 1̲.̲1̲/̲s̲e̲c̲.̲

             2)  R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲2̲

                 The msg will pass point 5. 9 times ( 3 segments
                 prep + 2 x 3 segments edit), points 6., 7.,
                 8., 9. once, points 2., 3., 4., twice (coordination,
                 points 2., 3., 4. once (rel. not. + comment),
                 points 2., 3. once and point 4. five times
                 (dist. to ave. 5 terminals), generates 2 comments
                 i.e. passes points 5., 2., 3., 4. twice:

                 CHKPT-ORATE = (9 + 4 + 3 x 2 + 3 + 2 + 5 +
                 4 x 2) x 0.1 = 3̲.̲7̲/̲s̲e̲c̲.̲

4.7.7.2  D̲i̲s̲k̲ ̲S̲p̲a̲c̲e̲ ̲R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲

         (Max. 2 Mb for checkpoint area).

4.7.7.3  S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲T̲i̲m̲e̲

         (Max. allowed for availability requirements 156 secs,
         design value 60 seconds).

4.7.7.4  R̲e̲s̲t̲a̲r̲t̲ ̲T̲i̲m̲e̲ ̲a̲f̲t̲e̲r̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲

         (Max. allowed 15 min.)

4.7.7.5  S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲ ̲R̲e̲s̲t̲a̲r̲t̲ ̲T̲i̲m̲e̲

         (Max. allowed 5 min.)

Figure 4.7.7.1-2
CAMPS SYSTEM MSG FLOW III OUTGOING MESSAGES

Fig. 4.7.7.1-1
CAMPS SYSTEM MSG FLOW II INCOMING MESSAGES

DataMuseum.dk

CR80 Wang WCS documentation floppies

⟦50bdda95b⟧ Wang Wps File

Derivation

WangText