top - download
⟦d918f0e06⟧ Wang Wps File
Length: 29155 (0x71e3)
Types: Wang Wps File
Notes: CPS/SDS/001 ISSUE 1
Names: »0670A «
Derivation
└─⟦f9d7301d0⟧ Bits:30006008 8" Wang WCS floppy, CR 0043A
└─ ⟦this⟧ »0670A «
WangText
…0e……00……00……00……00……19……0a……00……00……19……0b……19…
…18……0a……18……0f……18……06……86…1 …02… …02… …02…
…02…CPS/SDS/001
…02…BBC/810227…02……02…
CAMPS SYSTEM DESIGN SPECIFICATION
…02……02…CAMPS
T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲
4.7 RECOVERY .............................. 255
4.7.1 Requirements and General Concepts ... 255
4.7.1.1 Queue Control at Restart ........ 255
4.7.1.2 Recovery Situations ............. 255
4.7.1.3 Recovery/Restart from Total
System Failure .................. 256
4.7.1.4 Recovery after Switch-over ...... 257
4.7.1.5 Withdrawal of Redundant Items ... 257
4.7.2 Failure Types ....................... 258
4.7.2.1 Minor Errors .................... 258
4.7.2.1.1 Process Failure ............. 258
4.7.2.1.2 Process Detected Failure .... 258
4.7.2.2 Single System Failures .......... 259
4.7.2.3 Total System Failures ........... 259
4.7.2.4 Software System Errors .......... 259
4.7.2.5 Disastrous Errors ............... 259
4.7.3 Recovery Actions .................... 259
4.7.4 Central Recovery Functions .......... 260
4.7.4.1 Checkpoint Function ............. 260
4.7.4.2 Consistent Checkpoints .......... 261
4.7.4.3 Checkpoint Generation ........... 261
4.7.4.4 Checkpoint Reception ............ 261
4.7.4.5 Checkpoint Restore .............. 261
4.7.4.6 Standby PU Coordination ......... 261
4.7.4.7 Queue Control at Restart ........ 262
4.7.4.8 Save System Data ................ 262
4.7.4.9 Restore System Data ............. 262
4.7.4.10 Disk System Integrity ......... 262
4.7.5 Recovery Level ...................... 262
4.7.5.1 Checkpoints ..................... 262
4.7.5 1 1 Checkpoint 1 ................ 264
4.7.5.1.2 Checkpoint 2 ................ 264
4.7.5.1.3 Checkpoint 3 ................ 264
4.7.5.1.4 Checkpoint 4 ................ 264
4.7.5.1.5 Checkpoint 5 ................ 265
4.7.5.1.6 Checkpoint 6 ................ 265
4.7.5.1.7 Checkpoint 7 ................ 265
4.7.5.1.8 Checkpoint 8 ................ 265
4.7.5.1.9 Checkpoint 9 ................ 266
4.7.5.1.10 Checkpoint 10 ............. 266
4.7.5.1.11 Checkpoint 11 ............. 266
4.7.5.1.12 Checkpoint 12 ............. 266
4.7.5.1.13 Checkpoint 13 ............. 266
4.7.5.1.14 Checkpoint 14 ............. 267
4.7.5.1.15 Checkpoint 15 ............. 267
4.7.5.1.16 Checkpoint 16 ............. 267
4.7.5.2 Level 0 ......................... 268
4.7.5.3 Level 1 (Restart) ............... 268
4.7.5.4 Level 2 (Switchover) ............ 268
4.7.5.5 Level 3 (Restart after Closedown) 269
4.7.6 Application Recovery Actions ........ 269
4.7.6.1 Traffic Handling & Distribution . 269
4.7.6.2 Terminal System ................. 270
4.7.6.2.1 User Functions .............. 270
4.7.6.2.2 Supervisor Functions ........ 272
4.7.6.2.3 Message Service Function .... 272
4.7.6.3 Storage & Retrieval ............. 272
4.7.6.4 Log & Accountability ............ 273
4.7.6.5 Statistics ...................... 273
4.7.7 Performance & Capacity .............. 274
4.7.7.1 Checkpoint Rate ................. 274
4.7.7.2 Disk Space Requirements ......... 275
4.7.7.3 Switch Time ..................... 275
4.7.7.4 Restart Time after Total System
Failure ......................... 275
Standby PU Restart Time ......... 275
Fig. 4.7.5.1 CAMPS System Msg. Flow I .... 263
Fig. 4.7.7.l-l CAMPS System Msg. Flow II . 276
Fig. 4.7.7.1-2 CAMPS System Msg. Flow III 277
4.7 R̲E̲C̲O̲V̲E̲R̲Y̲
4.7.1 R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲ ̲a̲n̲d̲ ̲G̲e̲n̲e̲r̲a̲l̲ ̲C̲o̲n̲c̲e̲p̲t̲s̲
The activities performed by the system following a
failure is termed "recovery". The requirements are
as follows:
4.7.1.1 Q̲u̲e̲u̲e̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲a̲t̲ ̲R̲e̲s̲t̲a̲r̲t̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲4̲.̲1̲.̲1̲.̲1̲4̲)̲
…02…a) During restart commands shall exist to cancel messages
and transactions as specified below.
1) For each outgoing channel cancel all messages
of specified precedence.
2) For each reception position cancel all output
of specified precedence.
3) For each release position cancel all messages
of specified precedence awaiting release.
4) For each preparation position cancel all suspended
transactions.
5) Cancel all messages of specified precedence
awaiting MDCO processing.
6) Cancel all messages of specified precedence
awaiting message service.
b) Messages, for which queue elements have been cancelled
as specified above will still be available for
retrieval to the extent specified in section 3.2.7.
4.7.1.2 R̲e̲c̲o̲v̲e̲r̲y̲ ̲S̲i̲t̲u̲a̲t̲i̲o̲n̲s̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲8̲.̲4̲)̲ ̲
Section 3.2.8.4 handles detailed recovery requirements
in the following situations:
- total system failure
- failure requiring a switchover
4.7.1.3 R̲e̲c̲o̲v̲e̲r̲y̲/̲R̲e̲s̲t̲a̲r̲t̲ ̲f̲r̲o̲m̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲
3̲.̲2̲.̲8̲.̲4̲.̲1̲
a) The system shall preserve essential data enabling
recovery in case of total system failure.
b) A total system failure may be caused by:
1) simultaneous error in redundant equipment
2) error in main power supply
3) detection of a program error, which cannot
be recovered internally.
c) No message must be lost within the network e.g.
1) messages, which were only partially transmitted
or for which FLASH acknowledgement have not
been received prior to the error, shall be
retransmitted.
2) continuity of the reporting system shall be
maintained to ensure, that messages are not
lost within the network as a consequence of
the loss of either a report or an automatically
generated service message.
d) Continuity shall be maintained to the time defined
by the last storage with respect to the following
data items:
1) log records
2) statistics
3) messages/comments in the HDB
4) current system parameter file
5) message accounting
e) The data (defined above) stored prior to the total
system failure can be recalled by the use of normal
commands.
f) Integrity of memory shall be maintained to ensure
that the system will not fail as a consequence
of operating upon corrupted data generated by a
preceding error.
g) Restart subsequent to recovery shall be based upon
the system parameter file obtained in 3.2.8.1(e).
h) Certain errors may preclude total recovery. In
these cases the system is recovered so that the
chance of losing a message is minimized.
i) Recovery and restart subsequent to a total system/restart
failure will be finalized within 15 minutes.
4.7.1.4 R̲e̲c̲o̲v̲e̲r̲y̲ ̲a̲f̲t̲e̲r̲ ̲S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲2̲.̲8̲.̲4̲.̲2̲)̲
a) Where redundant equipment is provided to meet the
availability/reliability requirements, the switch
over from an active configuration to a standby
configuration shall not result in loss of messages,
i.e
1) The communication protocols shall ensure retransmission
of messages being garbled or partly lost.
2) The accountability log shall ensure that the
accountability related to messages is not lost
b) Where a transaction at a terminal is in progress
the user shall be informed of the state to which
the transaction has been recovered after switch
over.
c) A switch over initiated by the detection of an
error, shall normally not require operator intervention.
4.7.1.5 W̲i̲t̲h̲d̲r̲a̲w̲a̲l̲ ̲o̲f̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲t̲ ̲I̲t̲e̲m̲s̲ ̲(̲r̲e̲f̲.̲:̲ ̲S̲R̲S̲ ̲3̲.̲4̲.̲4̲.̲5̲.̲1̲)̲
- The capability is maintained to re-integrate such
redundant items into the operational configuration
and return them to service within 5 minutes.
- Recovery procedures after system failure shall
include checksumming of the operating system software,
reloading if this is necessary. (ref.: SRS 3.4.5.7).
To meet these requirements, maximize availability and
minimize inconvenience to the users, an event based
checkpoint method is used. Restart points are defined
in the message flow to meet these objectives and information
gathers when messages pass these points.
The number of checkpoints and the placement are determined
taking into account the following factors:
- Availability
- User inconvenience
- System overhead
- System complexity
The actual method employed is that checkpoint information,
consisting partly of message control blocks (MCB's)
partly of file directory information, is transferred
to the stand-by PU for each point passed by the message.
At certain points the information is recorded on disc
as well.
This means that several levels of recovery can be employed
depending upon the type of failure and the availability
of stand-by PU and other redundant equipment.
4.7.2 F̲a̲i̲l̲u̲r̲e̲ ̲T̲y̲p̲e̲s̲
4.7.2.1 M̲i̲n̲o̲r̲ ̲E̲r̲r̲o̲r̲s̲
These are defined as errors that can be recovered from
without switch-over to stand-by PU and without system
reload.
4.7.2.1.1 P̲r̲o̲c̲e̲s̲s̲ ̲F̲a̲i̲l̲u̲r̲e̲
- Illegal operation
- Security violation attempt
- etc.
4.7.2.1.2 P̲r̲o̲c̲e̲s̲s̲ ̲D̲e̲t̲e̲c̲t̲e̲d̲ ̲F̲a̲i̲l̲u̲r̲e̲
- HW error (other than PU failure)
- Certain types of file corruption
- Resource error
- etc.
4.7.2.2 S̲i̲n̲g̲l̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲
These are defined as HW-errors limited to one PU thus
recoverable by switch-over to stand-by:
- Any active PU failure.
4.7.2.3 T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲
These are defined as malfunction of the total dualized
configuration:
- Simultaneous failure of dualized equipment
- Single PU failure when stand-by PU not available
- Power failure
4.7.2.4 S̲o̲f̲t̲w̲a̲r̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲s̲
These are defined as programming errors (other than
minor errors) resulting in system break and recoverable
by system software reload and restart.
4.7.2.5 D̲i̲s̲a̲s̲t̲r̲o̲u̲s̲ ̲E̲r̲r̲o̲r̲s̲
These are defined as errors resulting in loss of vital
system files and thus not completely recoverable with
respect to:
- Messages
- Accountability
- Log
- Statistics
- System parameters
4.7.3 R̲e̲c̲o̲v̲e̲r̲y̲ ̲A̲c̲t̲i̲o̲n̲s̲
1̲.̲ E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲u̲p̲
This action is applicable to recovery from minor errors
ref. section 4.11.
2̲.̲ S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲t̲o̲ ̲S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲
This action is taken upon detection of a fault in the
active PU when the stand-by PU is available.
3̲.̲ R̲e̲s̲t̲a̲r̲t̲ ̲o̲f̲ ̲S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲
This action is required when a PU is returned to the
system (after repair or off-line operation) for use
as a stand-by PU.
4̲.̲ R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲C̲l̲o̲s̲e̲ ̲D̲o̲w̲n̲
This action is taken when the system is brought back
into operation following an ordered close down.
5̲.̲ T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲R̲e̲l̲o̲a̲d̲ ̲
This action will be taken upon failure of the total
system due to HW or SW, after the error condition is
detected and removed.
6̲.̲ I̲n̲i̲t̲i̲a̲l̲ ̲S̲t̲a̲r̲t̲-̲u̲p̲
This action is taken at first time load of the system
(following installation) and is required to bring the
system back to operational use following a disastrous
error.
4.7.4 C̲e̲n̲t̲r̲a̲l̲ ̲R̲e̲c̲o̲v̲e̲r̲y̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
4.7.4.1 C̲h̲e̲c̲k̲-̲p̲o̲i̲n̲t̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲
As checkpointing is event based, that is that checkpoints
are defined as points in the message propagation through
the system where a transition from one stage to the
next takes place, the checkpoint function is required
to record, at these points, information about the message
that is necessary to recover it to the last state where
it was checkpointed.
As the message itself is stored on disk, the information
required to recover it, is the M̲essage C̲ontrol B̲lock
(MCB) identifying it and the state of it, plus index
information pointing to where the message is stored.
The recovery functions are:
4.7.4.2 C̲o̲n̲s̲i̲s̲t̲e̲n̲t̲ ̲C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲s̲
Central system functions ensuring that update of MSG,
queues, tables, files etc. is done in a way that ensures
that related updates are either all performed or none
of them performed.
4.7.4.3 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲G̲e̲n̲e̲r̲a̲t̲i̲o̲n̲
Central system function for generation of checkpoint
data transparent to the individual sub-systems and
transferring these data to disk and/or st.b. PU.
4.7.4.4 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲e̲c̲e̲p̲t̲i̲o̲n̲
Central system function (SS&C) for receiving checkpoint
data from the active PU and maintaining a pool of "active
checkpoints".
4.7.4.5 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲e̲s̲t̲o̲r̲e̲
Central system function to enable regeneration of system
state from checkpoint data previously generated. These
data may come from disk in the case of warm start or
from st. b. PU in the case of switch-over.
4.7.4.6 S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲ ̲C̲o̲o̲r̲d̲i̲n̲a̲t̲i̲o̲n̲
Central System Functions (SS&C) for re-establishing
operation of stand-by PU by transferring programs,
tables, index, saved checkpoint data etc. to it.
4.7.4.7 Q̲u̲e̲u̲e̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲a̲t̲ ̲R̲e̲s̲t̲a̲r̲t̲
Central System Functions to enable the required manipulation
of queues after restart and before the system becomes
fully operational.
4.7.4.8 S̲a̲v̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲D̲a̲t̲a̲
Central System Functions for storing of queues, tables,
index etc. on disk prior to ordered close down.
4.7.4.9 R̲e̲s̲t̲o̲r̲e̲ ̲S̲y̲s̲t̲e̲m̲ ̲D̲a̲t̲a̲
Central System Functions for re-establishing queues,
tables, index etc. in main memory during restart after
ordered close down.
4.7.4.10 D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲I̲n̲t̲e̲g̲r̲i̲t̲y̲
The mirrored disk concept prevents "hard" faults as
well as "soft" faults and thus obviates a special "safe
update" mechanism.
4.7.5 R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲
The level of recovery is defined by the checkpoints
and depends upon the type of failure.
4.7.5.1 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲s̲
The points in the message flow through the system that
defines completion of a defined task are called checkpoints
and described individually in this section. Ref. Fig.
4.7.5.1.
Fig. 4.7.5.1…01…CAMPS SYSTEM MSG FLOW…01…I
4.7.5.1.1 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲
Incoming messages are stored, sent to the IMQ and checkpointed.
After analysis and possibly Garble Correction, they
are sent to MDQ for "normal" messages, and Service
MSG to SVQ. Recovery at this stage will reestablish
the IMQ and thus require actions prior to next checkpoint
to be repeated.
In the case of discontinuity in channel seq-no supervisor
action is invoked (for sending out service message
etc).
In the case of a flash message, the flash message receipt
will be sent to the OMQ (and thus checkpointed) before
the message itself is sent to the MDQ and checkpointed.
4.7.5.1.2 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲2̲
All messages, comments and release notifications sent
for distribution will go into the MDQ and be checkpointed.
They will be distributed to all recipients possibly
with assistance from MDCO.
4.7.5.1.3 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲3̲
All messages, comments and release notifications that
have passed distribution ( with or without MDCO interaction)
are checkpointed avoiding repeated distribution in
case of recovery.
Output to PTP is checkpointed.
4.7.5.1.4 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲4̲
All messages incl. those presented for coordinators,
comments, and release notifications presented at a
terminal are checkpointed. This means that checkpointing
is performed on a "presented per terminal" -basis avoiding
re-presentation after recovery.
This applies to printing where applicable.
Output to PTP is checkpointed.
4.7.5.1.5 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲5̲
Messages and comments created during Initial Preparation
are checkpointed and recoverable to the last segment
stored on disk. After completion of that phase (with
possible correction) the message is completely recoverable
in this "initial version". After completion of editing,
a "current version" exists and the corresponding checkpointing
is done. After that, this version is recoverable.
Input from OCR/PTR is recoverable when complete.
4.7.5.1.6 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲6̲
Messages sent for release are sent to the RLQ and checkpointed.
When the messages are released, rejected or deferred,
they are removed from RLQ and thus prior to that recovered
in the RLQ.
4.7.5.1.7 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲7̲
Messages released are sent to the MRQ for routing and
checkpointed. They are thus recovered there. The
Release notification (with possible comment) is sent
to the MDQ and checkpointed there.
4.7.5.1.8 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲8̲
Messages that have passed routing (possibly with Routing
assistance or Group Count assignment) are sent to the
OMQ for transmission or MDQ for local distribution
and checkpointed.
4.7.5.1.9 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲9̲
When messages are transmitted a check point meaning
"message finished" is sent. Messages for which an acknowledgement
is expected will remain in the OMQ until reception
of the acknowledgement, then above action will be performed.
In case of a channel failure preventing the message
to be sent, it will be returned to the MRQ and checkpointed.
All messages recovered in the OMQ will be retransmitted.
4.7.5.1.10 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲0̲
Messages rejected or deferred are removed from the
RLQ and together with the notification (with comment)
sent to the MDQ. They are checkpointed together in
an indivisible operation.
4.7.5.1.11 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲1̲
Messages directed to supervisor or service messages
are sent to the SVQ and checkpointed. Prior to that,
they will be recovered from previous checkpoint.
4.7.5.1.12 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲2̲
Messages presented for the supervisor are checkpointed
as finished. Prior to that they are recovered in the
SVQ and represented.
4.7.5.1.13 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲3̲
Messages that message service decides shall be stopped
are checkpointed as finished.
4.7.5.1.14 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲4̲
Messages that MDCO decides shall be stopped are checkpointed
as finished.
4.7.5.1.15 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲5̲
Messages from CCIS sent for release, rejected or deferred
by the releasing officer are checkpointed as finished.
The associated comment is sent to MRQ and checkpointed.
4.7.5.1.16 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲1̲6̲
Messages prepared by the supervisor are treated as
described for point 5.
4.7.5.2 L̲e̲v̲e̲l̲ ̲O̲
This is the state resulting from cold start. The system
is empty (except for possibly the HDB) which means
that no processing will take place before external
activity starts (incoming messages, sign-in, retrieval
etc.).
4.7.5.3 L̲e̲v̲e̲l̲ ̲1̲ ̲(̲R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲)̲
This is the state resulting from total system failure
and recovery is based upon data on disk. Disk system
integrity checks on vital data are performed to ensure
validity. Then system data are reloaded and checkpoint
data read from disk and used to restore queues, directories,
status information etc. The checkpoint data written
to disk are from points 1, 5, 16 where msg are "created"
and points 4, 9, 12, 13, 14, 15 where msg are "finished".
Point 4 only when distribution is fully completed.
Point 5 only when preparation completed. Status reports
generated and not yet presented will be lost, as the
content of them is likely to be out of date. They
can then be requested by the users and they will cover
the time interval set in system parameters (by supervisor).
Global No Series r̲e̲l̲a̲t̲e̲d̲ ̲t̲o̲ ̲m̲s̲g̲ will be recovered.
4.7.5.4 L̲e̲v̲e̲l̲ ̲2̲ ̲(̲S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲)̲
This is the state following switch-over to the standby
PU. No disk system integrity checking is required
as the mirrored pair of disks are identical.
Recovery is based upon the checkpoint data collected
in the st.b. PU prior to switch-over and thus the recovery
level is to the last checkpoint.
Status reports generated but not yet presented are
lost.
Global No Series r̲e̲l̲a̲t̲e̲d̲ ̲t̲o̲ ̲m̲s̲g̲ will be recovered.
4.7.5.5 L̲e̲v̲e̲l̲ ̲3̲ ̲(̲R̲e̲s̲t̲a̲r̲t̲ ̲a̲f̲t̲e̲r̲ ̲C̲l̲o̲s̲e̲ ̲D̲o̲w̲n̲)̲
This is the state resulting from ordered close down.
As the close down is performed in such a way, that
all activities are terminated in a well defined state
and all queue, message, index etc. information is saved
on disk, the recovery can be accomplished without repetition
of manual work and discontinuities in data. Ref. description
of SS&C section 5.10.
4.7.6 A̲p̲p̲l̲i̲c̲a̲t̲i̲o̲n̲s̲ ̲R̲e̲c̲o̲v̲e̲r̲y̲ ̲A̲c̲t̲i̲o̲n̲s̲
The principles behind checkpoint/recovery make it possible
to keep the "local" sub-system recovery actions to
a minimum. The applications recovery actions can be
divided into implicit and explicit actions. The implicit
"actions" are merely a set of restrictions which the
application should adhere to in order to ensure system
integrity and recoverability to last checkpoint. As
this has to do with the way central functions are used,
the rules can be found in the sections describing these.
The explicit actions are those required to "clean up"
before restart of the operation. Again most "clean
up" is performed automatically by central system functions,
but in certain cases, some actions are most conveniently
done by the applications. Such actions are described
in this section.
4.7.6.1 T̲r̲a̲f̲f̲i̲c̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲ ̲&̲ ̲D̲i̲s̲t̲r̲i̲b̲u̲t̲i̲o̲n̲
a) A̲n̲a̲l̲y̲s̲i̲s̲
Incoming messages checkpointed last at point 1
will be recovered in IMQ. To ensure that a flash
message receipt is not "forgotten", it is sent
to OMQ point 8 before passed on to MDQ point 2.
In the case, where the flash message is recovered
in IMQ and the flash message receipt recovered
in OMQ, a new flash message receipt will be generated
and thus t̲w̲o̲ such messages sent for the same flash
message, the latter of the two will be stamped
suspected duplicate. This also means that the analysis
module must be prepared to cope with receiving
two such messages from another site.
Further, provided that the flash message receipt
reaches the sender of the flash message too late
(due to re-start), the analysis module must be
prepared to receive the same message twice.
b) D̲i̲s̲t̲r̲i̲b̲u̲t̲i̲o̲n̲
N/A.
c) R̲o̲u̲t̲i̲n̲g̲
N/A.
d) T̲r̲a̲n̲s̲m̲i̲s̲s̲i̲o̲n̲/̲F̲i̲n̲i̲s̲h̲
All messages are recovered in the OMQ and will
be retransmitted. They will be "stamped" "suspected
duplicate" where applicable and in case of a flash
message that is repeated, the flash message receipt
will be duplicated, too.
4.7.6.2 T̲e̲r̲m̲i̲n̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲
Generally, the functions in the terminal system are
recovered to an extent minimizing user inconvenience.
This means that formats requiring extensive data entry
(prep. etc.) are recovered per segment saved on disk
(where saved means stored on disk and checkpointed
with respect to the level of recovery applicable) and
others recovered when complete.
4.7.6.2.1 U̲s̲e̲r̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
a) P̲r̲e̲p̲a̲r̲a̲t̲i̲o̲n̲
Preparation of messages and comments is recovered
to last saved segment on disk.
Input completely received from OCR/PTR is recovered
from disk.
b) P̲r̲e̲s̲e̲n̲t̲a̲t̲i̲o̲n̲
Presentation (display/print) in progress will be
restarted from beginning, either on a per terminal
basis/recovery level 2, or per document basis/recovery
level 1.
Output to PTP will be restarted from beginning
preceded by blank tape.
c) R̲e̲t̲r̲i̲e̲v̲a̲l̲
As for presentation.
d) D̲e̲l̲e̲t̲i̲o̲n̲
Deletion requests deferred to supervisor are recovered
in the SVQ. Otherwise, the request will have to
be repeated, if not complete.
e) S̲i̲g̲n̲ ̲I̲n̲/̲O̲u̲t̲ ̲-̲ ̲S̲e̲c̲u̲r̲i̲t̲y̲
For security reasons terminals will be signed off
during recovery. Thus sign-in and security interrogation/warning
will be repeated.
f) S̲t̲a̲t̲u̲s̲
Outgoing Message Status, Delivery Status and Message
Release Status requests will have to be repeated
as the content is likely to be out of date. Where
status information is automatically output at time
intervals preset by supervisor this will be lost,
but status requests issued until next automatic
output will cover the preset time interval.
g) R̲e̲l̲e̲a̲s̲e̲
Messages released will be recovered in MRQ and
release notification (possibly with comment) sent
to drafter recovered there.
Messages rejected or deferred are sent to MDQ together
with release notification and recovered there.
4.7.6.2.2 S̲u̲p̲e̲r̲v̲i̲s̲o̲r̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
All functions accepted by the system are recoverable.
Prior to that, they will have to be repeated.
Preparation of service messages is treated as user
preparation with respect to recovery.
4.7.6.2.3 M̲e̲s̲s̲a̲g̲e̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲
a) G̲a̲r̲b̲l̲e̲ ̲C̲o̲r̲r̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲G̲r̲o̲u̲p̲ ̲C̲o̲u̲n̲t̲ ̲A̲s̲s̲i̲g̲n̲m̲e̲n̲t̲
Prior to completion, the message is recovered in
IMQ (this is the version input to Msg. Serv.) and
upon completion returned to IMQ and recoverable
there.
b) R̲I̲-̲A̲s̲s̲i̲g̲n̲m̲e̲n̲t̲
Prior to completion, the message is recovered in
MRQ (this is the version input to Msg. Serv.) and
upon completion returned to MRQ and recoverable
there.
c) O̲C̲R̲ ̲I̲n̲p̲u̲t̲
OCR input directed to msg. serv. will be checkpointed
in point 5 and prior to purging recoverable there.
4.7.6.3 S̲t̲o̲r̲a̲g̲e̲ ̲&̲ ̲R̲e̲t̲r̲i̲e̲v̲a̲l̲ ̲(̲S̲A̲R̲)̲
a) Retrievals in progress will be lost and have to
be repeated by the user and thus no recovery action
is required by SAR.
b) Storage of items in Intermediate Storage is automatically
recovered as far as SAR is concerned because of
checkpointing.
c) Dump of items from Intermediate Storage to Long
Term Storage is implemented in 5 steps:
1) "Dump Items" command from SAR to CSF.
2) "Dump complete" response from CSF to SAR.
3) Update of Long Term Storage Catalogue.
4) Update of Intermediate Storage Catalogue.
5) "Remove dumped items" command from SAR to CSF
This ensures that a failure will never result in
inconsistency between storage and catalogue and
thus no loss of retrievable items can occur.
4.7.6.4 L̲o̲g̲ ̲&̲ ̲A̲c̲c̲o̲u̲n̲t̲a̲b̲i̲l̲i̲t̲y̲
Log records are stored on disk per 5 records generated
or 30 seconds elapsed. Log records are sent to the
St.b. PU so that complete recovery of log records can
be achieved in the case of Switch-over. Log records
are stored on disk in items (by the Message management
system) and thus recovery to the last Storage can be
achieved.
In the case of switch-over, the logging system will
have checkpointed log records distributed to it and
will be responsible for completing the log file.
When recovery necessitates repetition of an event to
be logged multiple log records for the same event can
occur. Such records will be marked to reflect this.
4.7.6.5 S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲
Statistics are required to be recovered to the last
storage on disk and thus this storage becomes the recovery
point.
4.7.7 P̲e̲r̲f̲o̲r̲m̲a̲n̲c̲e̲ ̲&̲ ̲C̲a̲p̲a̲c̲i̲t̲y̲
4.7.7.1 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲ ̲R̲a̲t̲e̲
The calculation of the checkpoint rate required is
based upon average busy minute throughput (30 msg/min
incoming, 6 msg/min outgoing) apart from message presentation
where average busy hour throughput 10 msg/min incoming
is more appropriate. The calculation is only a gross
sizing which takes into account only main paths (occurrences
contributing less than 20% are not taken into consideration).
a) I̲n̲c̲o̲m̲i̲n̲g̲ ̲M̲S̲G̲ ̲M̲a̲i̲n̲ ̲P̲a̲t̲h̲
1) R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲1̲
The msg will pass points 1. and 4. once:
CHKPT-IRATE = 1 x 0.5 + 1. x 0.166 = 0̲.̲6̲6̲6̲/̲s̲e̲c̲.̲
2) R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲2̲
The msg will pass points 1., 2., 3. once and
4. 15 times for display + 15 times for print.
CHKPT-IRATE = 3 x 0.5 + 30 x 0.166 = 6̲.̲5̲/̲s̲e̲c̲.̲
b) O̲u̲t̲g̲o̲i̲n̲g̲ ̲M̲S̲G̲ ̲M̲a̲i̲n̲ ̲P̲a̲t̲h̲
1) R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲1̲
The msg will pass point 5. 3 times (prep +
2 edit) point 9. once, point 4. once (coordination),
point 4. once (rel. not + comment), point 4.
(local dist.) once, generates 2 comments i.e.
passes points 5., 4. twice:
CHKPT-ORATE = (3 + 1 + 1 + 1 + 1 + 2 x 2 )
x 0.1 = 1̲.̲1̲/̲s̲e̲c̲.̲
2) R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲ ̲2̲
The msg will pass point 5. 9 times ( 3 segments
prep + 2 x 3 segments edit), points 6., 7.,
8., 9. once, points 2., 3., 4., twice (coordination,
points 2., 3., 4. once (rel. not. + comment),
points 2., 3. once and point 4. five times
(dist. to ave. 5 terminals), generates 2 comments
i.e. passes points 5., 2., 3., 4. twice:
CHKPT-ORATE = (9 + 4 + 3 x 2 + 3 + 2 + 5 +
4 x 2) x 0.1 = 3̲.̲7̲/̲s̲e̲c̲.̲
4.7.7.2 D̲i̲s̲k̲ ̲S̲p̲a̲c̲e̲ ̲R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲
(Max. 2 Mb for checkpoint area).
4.7.7.3 S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲T̲i̲m̲e̲
(Max. allowed for availability requirements 156 secs,
design value 60 seconds).
4.7.7.4 R̲e̲s̲t̲a̲r̲t̲ ̲T̲i̲m̲e̲ ̲a̲f̲t̲e̲r̲ ̲T̲o̲t̲a̲l̲ ̲S̲y̲s̲t̲e̲m̲ ̲F̲a̲i̲l̲u̲r̲e̲
(Max. allowed 15 min.)
4.7.7.5 S̲t̲a̲n̲d̲-̲b̲y̲ ̲P̲U̲ ̲R̲e̲s̲t̲a̲r̲t̲ ̲T̲i̲m̲e̲
(Max. allowed 5 min.)
Figure 4.7.7.1-2
CAMPS SYSTEM MSG FLOW III OUTGOING MESSAGES
Fig. 4.7.7.1-1
CAMPS SYSTEM MSG FLOW II INCOMING MESSAGES