top - download
⟦93c72024c⟧ Wang Wps File
Length: 26314 (0x66ca)
Types: Wang Wps File
Notes: CPS/SDS/001
Names: »1363A «
Derivation
└─⟦27ec2f823⟧ Bits:30006056 8" Wang WCS floppy, CR 0088A
└─ ⟦this⟧ »1363A «
WangText
0…07…/…0d…/
…02…CPS/SDS/001
…02…FH/811020…02……02…
CAMPS SYSTEM DESIGN SPECIFICATION
…02…ISSUE 1.1…02…CAMPS
T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲
4.10 AVAILABILITY, MAINTAINABILITY AND INTEGRITY
OF OPERATION ..............................
334
4.10.1 Availability ........................
335
4.10.1.1 Definitions .....................
335
4.10.1.2 Requirements and Verification ...
335
4.10.1.3 Unavailability and Switch-over
Time ............................
344
4.10.2 Maintainability .....................
345
4.10.2.1 Definitions .....................
345
4.10.2.2 Requirements and Verification ...
345
4.10.3 Integrity of Operation ..............
346
4.10.3.1 Definition ......................
346
4.10.3.2 Requirements ....................
347
4.10.3.3 Verification ....................
347
4.10.1 A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲
4.10.1.1 A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲ ̲D̲e̲f̲i̲n̲i̲t̲i̲o̲n̲s̲
a) A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲.̲ The probability of finding an item
in a functioning condition at a given time.
b) M̲e̲a̲n̲ ̲t̲i̲m̲e̲ ̲b̲e̲t̲w̲e̲e̲n̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲(̲M̲T̲B̲F̲)̲.̲ The statistical
mean of the functioning time between failures.
For a given interval, the total measured functioning
time of the item divided by the total number of
failures of that item during the interval. Agreed
scheduled preventive maintenance of modules of
the equipment shall not be counted, when estimating
MTBF.
c) M̲e̲a̲n̲ ̲t̲i̲m̲e̲ ̲t̲o̲ ̲R̲e̲p̲a̲i̲r̲ ̲(̲M̲T̲T̲R̲)̲.̲ The statistical mean
of distribution of times to repair. The summation
of active repair times during a given period of
time divided by the total number of malfunctions
during the same time interval. This repair time
shall include all actions required to detect, locate
and repair the fault.
4.10.1.2 A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲ ̲R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲ ̲a̲n̲d̲ ̲V̲e̲r̲i̲f̲i̲c̲a̲t̲i̲o̲n̲
The detailed hardware requirements are defined in CPS/210/SYS/0001
section 3.4.4.4.
Verification that the CAMPS system fulfils the requirements
is given in the R&M Program Plan, CPS/PLN/004.
The CAMPS on-line operations affect the MTTR of equipment
by providing detailed error reports upon detection
of an error. A description of these facilities is given
in section 4.11.
The availability requirements are partitioned into
6 major requirements:
1- Service to individual user connecting points
2- Service to individual external channels
3- Service common to groups of user connecting points
4- Service common to groups of external channels
5- Service common to 75% of user connecting points
6 Service common to all circuits, channels and user
connecting points
Overleaf 6 figures are depicted to summarize the availability
requirements.
The figures also indicate the hardware involved (shaded
areas).
C̲o̲m̲m̲e̲n̲t̲s̲ ̲t̲o̲ ̲t̲h̲e̲ ̲F̲i̲g̲u̲r̲e̲ ̲C̲o̲n̲f̲i̲g̲u̲r̲a̲t̲i̲o̲n̲s̲
The following information is given:
- A = availability (fraction)
- MTTR
- MTBF
a) D̲I̲S̲K̲
The configuration contains 2 mirrored disks. The
physical placement of the mirrored disks is determined
at system generation. The third disk is a stand
alone disk (used for e.g. off line retrieval) and
is not included in the availability verification.
b) D̲u̲a̲l̲i̲z̲e̲d̲ ̲E̲q̲u̲i̲p̲m̲e̲n̲t̲
If a MTTR is specified for a dualized equipment,
then the MTTR for the single equipment is: 2 x
MTTR.
c) W̲a̲t̲c̲h̲d̲o̲g̲ ̲P̲o̲s̲i̲t̲i̲o̲n̲
The Watchdog Processor, the operator VDU and printer
contribute to the unavailability of the shaded
equipment in two cases:
- the Watchdog Processor fails at the time of
an automatic reconfiguration involving the
watchdog (e.g. PU switchover).
- the Watchdog Processor, the operator VDU or
printer fails during execution of M&D software
and thereby prolongs the MTTR.
6 stk. tegninger inds`ttes 2 p> hver side
4.10.1.3 U̲n̲a̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲ ̲a̲n̲d̲ ̲S̲w̲i̲t̲c̲h̲-̲o̲v̲e̲r̲ ̲T̲i̲m̲e̲
This section defines the sources, which gives unavailability
for the total CAMPS system:
- unavailability of a PU and attached IO-BUS
- unavailability of the redundant DISK system
- unavailability of the redundant TDX system
- unavailability of the watchdog
- switch-over time
The switch over time is determined from the following
equation:
SWT * PU ̲IOBUS ̲ERRORS =
MAX ̲U - U ̲WDP - U ̲PU ̲IOBUS - U ̲DISK ̲CTR
where
SWT = switch-over time in minutes
MAX ̲U max allowed unavailability = 26.28 minutes per
year
U ̲WDP = watchdog unavailability = 4.73 minutes per
year
U ̲DISK ̲TDX = redundant DISK + TDX system unavailability
= 2.10 minutes per year
U ̲PU ̲IOBUS = redundant PU + IOBUS
unavailability = 0,37 minutes per
year
PU ̲ IOBUS ̲ERRORS = no of errors in nonredundant
PU + IOBUS equipment = 7,35 per year
This gives SWT = 2,6 minutes
The above calculation is based on the following figures
taken from the R&M plan:
1- system availability required = 0.99995
2- TDX + DISK system availability provided = 0.999996
3- watchdog: 9 errors per million hours
4- PU: 816 errors per million hours
5- IO BUS: 23 errors per million hours
6- MTTR = 1 hour
To provide a reasonable safety factor, a design value
for the switch-over time of 60 seconds is selected.
4.10.2 M̲a̲i̲n̲t̲a̲i̲n̲a̲b̲i̲l̲i̲t̲y̲
4.10.2.1 M̲A̲I̲N̲T̲A̲I̲N̲A̲B̲I̲L̲I̲T̲Y̲ ̲D̲E̲F̲I̲N̲I̲T̲I̲O̲N̲S̲
a) C̲o̲r̲r̲e̲c̲t̲i̲v̲e̲ ̲m̲a̲i̲n̲t̲e̲n̲a̲n̲c̲e̲.̲ The maintenance undertaken
to restore an item to a specified condition after
a failure has occurred (the corrective maintenance
aims at reducing the MTTR).
b) P̲r̲e̲v̲e̲n̲t̲i̲v̲e̲ ̲m̲a̲i̲n̲t̲e̲n̲a̲n̲c̲e̲.̲ The maintenance undertaken
systematically with the intention of keeping an
item in a specified condition, reducing the occurence
of failures, and prolonging the useful life of
the equipment (the effective MTBF is increased).
c) O̲f̲f̲l̲i̲n̲e̲ ̲m̲a̲i̲n̲t̲e̲n̲a̲n̲c̲e̲ ̲a̲n̲d̲ ̲d̲i̲a̲g̲n̲o̲s̲t̲i̲c̲s̲ ̲(̲M̲&̲D̲)̲.̲ The
M&D software contains a set of hardware test programs,
which provides error detection down to module level.
The command interpreter software in the diagnostics
package enables the operator to execute the diagnostic
tests.
The test programs are either residing on floppy
disk or on the offline disk.
Test results are printed at the operator printer.
4.10.2.2 R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲ ̲a̲n̲d̲ ̲V̲e̲r̲i̲f̲i̲c̲a̲t̲i̲o̲n̲ ̲o̲f̲ ̲t̲h̲e̲s̲e̲
Requirements to preventive and corrective maintenance
and verification of these are given in:
Maintenance Plan: CPS/PLN/006
Generally the corrective maintenance aims at a detection
of a faulty module by means of
- on-line diagnostics or
- direct operator observation or
- execution of M&D software
and subsequent
- replacement of faulty module or
- reconfiguration
The M&D software is defined in:
Maintenance and diagnostics software, CPS/SDS/XXX
On-line error reporting is described in section 4.11.
4.10.3 I̲n̲t̲e̲g̲r̲i̲t̲y̲ ̲o̲f̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲
4.10.3.1 D̲e̲f̲i̲n̲i̲t̲i̲o̲n̲
Integrity of operation defines the means to limit the
effect of an error through
- timely detection of an error
- error reporting
- corrective actions
Violation of integrity of operation can occur if
- an error is not detected
- an error is not detected in proper time
- an error detected but not reported
- actions subsequent to an error can not remedy the
situation.
4.10.3.2 R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲
The probability that a message or internal transaction
is:
- lost wholly or in part or
- misdirected, or
- corrupted
as a result of an equipment error shall be less than
1 in 10…0e…7…0f….
This requirement is interpreted as:
The probability, that a message or comment is misdirected
or corrupted as a result of a hardware error shall
be less than 1 in 10…0e…7…0f….
4.10.3.3 V̲e̲r̲i̲f̲i̲c̲a̲t̲i̲o̲n̲
Section 4.11 defines the CAMPS on-line facilities for
error detection, error reporting and corrective actions.
This section also includes a description of defensive
mechanisms (e.g. validation of data passed between
packages) provided by the CAMPS on-line system.
T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲
4.11 ERROR AND BACKLOG HANDLING ..............
349
4.11.1 Error Processing Mechanisms .........
349
4.11.1.1 Error Reception/Reporting .......
350
4.11.1.2 Error Display/Printout ..........
350
4.11.2 Error Detection and Localization ....
350
4.11.2.1 Error Detection/localization
Analysis ........................
351
4.11.2.2 Errors Detection by a PU ........
351
4.11.2.2.1 PU Hardware Error Detection
352
4.11.2.2.2 PU Firmware Error Detection
352
4.11.2.2.3 PU Software Error Detection
352
4.11.3 Error Fix-up ........................
353
4.11.3.1 PU or IO Bus Error Fix-up .......
354
4.11.3.2 TDX System Error ................
354
4.11.3.3 Mirrored Disk System Error ......
354
4.11.3.4 Offline or Floppy Disk Error ....
354
4.11.3.5 LTU System Error ................
355
4.11.3.6 LTUX System Error ...............
356
4.11.3.7 Watchdog System Error ...........
356
4.11.3.8 Power Down ......................
357
4.11.3.9 Hardware Resource Error .........
357
4.11.3.10 Security or Access Control Error
357
4.11.3.11 Software Resource Error .........
358
4.11.3.12 Miscellaneous ...................
358
4.11.4 Backlog Handling ....................
358
4.11.4.1 Dead Lock .......................
359
4.11.4.2 Overload ........................
359
4.11.4.2.1 Queue Overload ..............
360
4.11.4.2.2 Intermediate Storage ........
360
4.11.4.2.3 Short Term Storage ..........
361
4.11 E̲R̲R̲O̲R̲ ̲A̲N̲D̲ ̲B̲A̲C̲K̲L̲O̲G̲ ̲H̲A̲N̲D̲L̲I̲N̲G̲
This section addresses the processing of technical
errors. Technical errors are hardware errors and the
software errors related to system software use. Errors
due to e.g. ACP127 message format analysis or syntax
errors in user input are not covered.
The section is divided into four subsections, which
describe:
a) The error processing mechanisms provided by the
Kernel
b) Error detection facilities and localization of
an erroneous module
c) Error types and corresponding error fix-up actions
d) Backlog handling facilities
The section only handles the occurence of a single
error. Multiple errors may imply a total system error,
in which case a WARM2 start-up is to be executed. However,
a total system error may be disastrous, if both mirrored
disks are corrupted (due to head-landing). In this
case a DEAD2 start-up is to be executed.
4.11.1 E̲r̲r̲o̲r̲ ̲P̲r̲o̲c̲e̲s̲s̲i̲n̲g̲ ̲M̲e̲c̲h̲a̲n̲i̲s̲m̲s̲
The Kernel contains a table, which defines an error
to error type relation. It is possible for a process
to specify error types for which it will take over
the error handling. The take over is implemented via
an application process defined procedure, which is
automatically invoked, if an error of the specified
type occurs. The application error fix-up process has
access to extended error information, which in detail
defines the error (e.g. to hardware module level).
It is, however, possible for the parent of the process
to inhibit a child from specifying certain error types
(e.g. security error).
Errors not handled by a process are given to the parent
of the process and the process is stopped.
4.11.1.1 E̲r̲r̲o̲r̲ ̲R̲e̲c̲e̲p̲t̲i̲o̲n̲/̲R̲e̲p̲o̲r̲t̲i̲n̲g̲
COPSY receives error reports subsequent to the detection
of an error from:
- On line diagnostics programs
- Application software, which has not specified an
error fix-up procedure
- PU firmware detected hardware errors
- Kernel detected software errors
- The watchdog having monitored a hardware error
Application processes receive error reports subsequent
to an error from the system software which on behalf
of the application operates on lines, files, queues,
areas, etc.:
- I/O system
- Message Monitor
- Queue Monitor
4.11.1.2 E̲r̲r̲o̲r̲ ̲D̲i̲s̲p̲l̲a̲y̲/̲P̲r̲i̲n̲t̲o̲u̲t̲
A process, which handles an error locally, reports
the result of the error fix-up to the SSC. The SSC
prints the report at the operator printer and if appropriate
updates the system status display at the operator VDU.
If the watchdog fails then error reports are directed
to the supervisor report printer.
4.11.2 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲
This section is divided into two paragraphs, which
contains:
a) The principles for an analysis, which will be accomplished
during detailed design, of error detection/localization
facilities provided by CAMPS to meet:
- Requirements to integrity of operation
- Error reporting requirements derived from the
MTTR requirements
b) A description of specific PU error detection facilities,
directly required in the CAMPS requirement specification.
4.11.2.1 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲/̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲ ̲A̲n̲a̲l̲y̲s̲i̲s̲
The CAMPS system will be broken down into hardware
and software subsystems. For each subsystem it will
be described how the subsystem reacts as a result of
an internal error. It will also be described how the
error is detected due to:
- Hardware traps (e.g. parity check)
- Firmware traps (e.g. LTU and TDX system protocols)
- Software traps (e.g. online diagnostics and validity
checks)
- Manual observation (e.g. some VDU errors, LED indication)
- Watchdog monitoring (e.g. power down)
Error detection is either direct (e.g. parity check
during memory access) or indirect (e.g. a CPU calculation
error may be detected via an illegal memory access).
Also, an error can be caused by a number of modules
or by either software or hardware. It is the objective
of the error isolation facilities to isolate an error
to one of the groups defined in section 4.11.3.
4.11.2.2 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲b̲y̲ ̲a̲ ̲P̲U̲
This section describes PU error detection facilities
specifically required in the CAMPS requirement specification.
4.11.2.2.1 P̲U̲ ̲H̲a̲r̲d̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲
a) Trapping of unassigned instructions. Execution
of any illegal code or bit pattern is detected
and results in an invocation of the Kernel.
b) Instructions are separated into two classes, one
for privileged use and one for application use.
4.11.2.2.2 P̲U̲ ̲F̲i̲r̲m̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲
The MAP module prevents programs from being able to
write in memory occupied by the operating system or
by other programs.
4.11.2.2.3 P̲U̲ ̲S̲o̲f̲t̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲
a) At system start-up all programs and data files
loaded into memory will carry block parity check
sum to allow the detection of converted data.
b) On-line diagnostics programs operating as low priority
tasks. A program to check sum the read-only part
of the system software exists. It is executed periodically
and on request from the supervisor.
c) High level external line protocols (e.g. continuity
and self-addressed service messages).
d) Parameter validity check, when receiving data.
e) Security and access control as specified in section
4.9.
4.11.3 E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲
Errors are divided into groups, which define a type
of errors to which a common error fix-up action exists.
The following hardware error types are foreseen:
- PU or IO BUS error
- TDX-BUS system (TDX-BUS + STI + TDX-CTR) error
- Mirrored disk system (DISK-CTR + DISK DRIVE + VOLUME)
- Off-line or floppy disk system
- LTU system (LTU + external line) error
- LTUX system (LTUX + BSM-X + terminal equipment)
error
- Watchdog or operator VDU or operator printer error
- Resource error (e.g. paper out)
- Power down
All error types except for resource errors and errors
related to the execusion of an instruction are non-recoverable.
This is due to the fact that the File Management System
(FMS) and Terminal Handling System (THS) give a high
level interface to peripherals e.g.
- performs repetition of operation (handles intermittent
errors).
- allocates a new disk sector, when a sector is bad.
Instructions, which imply an error interrupt, will
be repeated several times, before being considered
erroneous
The following software error types are foreseen:
- Security or access error
- Resource error
- Miscellaneous
4.11.3.1 P̲U̲ ̲o̲r̲ ̲I̲O̲ ̲B̲U̲S̲ ̲E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲
A PU or IO BUS error will imply switch over to the
stand by PU as described in section 4.3.1.4.
If the stand by PU is unavailable the active PU is
disabled and a WARM2 start-up of the off-line PU can
be performed.
4.11.3.2 T̲D̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
A TDX system error is handled by the Terminal Handling
System (THS) in the IOC and by the SSC in common. The
SSC switches LTUXs (via the watchdog) to the appropriate
TDX-BUS and updates the Configuration table. The THS
performs a TDX-system switch over transparent to the
application (i.e. TEP and THP).
If the stand by TDX-BUS is used for off-line operation
a total system error exists, and the PU is disabled.
After insertion of a TDX BUS a WARM2 start-up can be
performed.
4.11.3.3 M̲i̲r̲r̲o̲r̲e̲d̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
An error in one of the mirrored disks is handled by
the FMS in the IOC. The FMS switches to the stand by
disk system transparent to the users. The SSC is notified
and updates the Configuration Table.
If both mirrored disks fail, a total system error exists
and the PUs will be disabled. A WARM2 start-up can
be executed, when the disk system is repaired.
4.11.3.4 O̲f̲f̲-̲L̲i̲n̲e̲ ̲o̲r̲ ̲F̲l̲o̲p̲p̲y̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
An error in the off-line or the floppy disk system
implies a removal of the disk system in question. The
packages involved in an error fix-up are:
a) SSC and TEP during the following operations:
- back-up of system parameter file
- off-loading of messages
- trace information storage
- copying of modified software to off-line disk
b) SSC and TEP and SAR during:
- retrieval of off-loaded messages
c) SSC during:
- load of start-up data
- copying of modified software to the on-line
disks
- memory dump
d) SSP and OLP during:
- off-line operations
During on-line operation, the SSC de-assigns devices
and dismounts the mirrored and the floppy disk
volumes, whereas the TEP dismounts the off-line
disk. Also, the SSC updates the Configuration table.
4.11.3.5 L̲T̲U̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
An LTU line error is handled by the THP and the SSC:
The THP closes the line activities, whereas the SSC
deletes the THP instance, which handles the line. Also,
the SSC updates the Configuration table.
An LTU error affects up to 2 lines. Per line the error
fix-up is as described above.
4.11.3.6 L̲T̲U̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
An error in the terminal equipment is handled by the
TEP and the SSC. The TEP cancels the ongoing transaction(s),
whereas the SSC deletes the TEP instance and updates
the Configuration Table.
An error in a TRC/point-to-point line is handled by
the THP and the SSC. The THP closes the line activities,
whereas the SSC deletes the THP instance and updates
the Configuration table.
An LTUX error involves the TEP or THP instances using
the LTUX. Per instance the fix-up is as described above.
A BSM error involves the TEP or THP instances using
the two LTUXs, which are controlled by the BSM. Per
terminal instance the fix-up is as described above.
4.11.3.7 W̲a̲t̲c̲h̲d̲o̲g̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲
If the operator VDU fails, then watchdog operation
not involving the VDU and printer continues.
If the operator printer fails, then the SSC directs
active PU error messages to the supervisor report printer.
The non-active PU will not be able to perform print-out.
The watchdog continues operations not involving the
printer and the VDU.
If the Watchdog Processor WDP fails, then the SSC directs
active PU error messages to the supervisor report printer.
If a reconfiguration, which involves the WDP, has to
take place, then a total system error exists.
4.11.3.8 P̲o̲w̲e̲r̲ ̲D̲o̲w̲n̲
If a power down is detected in the active PU a switch-over
to the stand by PU is automatically executed.
If a power down is detected in the non-active PU, then
the PU is disabled.
The IO-crate contains two power supplies:
- one supply per IO BUS
- dualized supply per IO BUS device
A single power down has effects identical to a PU power
down.
A power down in a TDX crate implies that the two LTUXs
in the crate are taken out of service.
A power down in the watchdog implies that the watchdog
is taken out of service. Refer to section 4.11.3.7.
4.11.3.9 H̲a̲r̲d̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲
Hardware resource errors include:
- Paper out
- No more file space
This error type may be handled by an application. Further
discussion is deferred to detailed design.
4.11.3.10 S̲e̲c̲u̲r̲i̲t̲y̲ ̲o̲r̲ ̲A̲c̲c̲e̲s̲s̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲E̲r̲r̲o̲r̲
These errors are due to a programming error.
Reactions subsequent to the detection of an error during
security- or access control is defined in section 4.9.
4.11.3.11 S̲o̲f̲t̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲
Software resource errors include:
- Queue full
- No more file control blocks (FCBs)
This type of error may be handled by an application,
which for example may:
- wait until resource is free
- wait specified period for resource to become free
- stop input
Refer to section 4.11.4, where some overload situations
are handled.
4.11.3.12 M̲i̲s̲c̲e̲l̲l̲a̲n̲e̲o̲u̲s̲
This type of errors include:
- Semantic error in input parameter
- Time out during process communication
- Backlog (refer section 4.11.4)
These errors are due to a programming error. Error
reactions are provided during detailed design.
4.11.4 B̲a̲c̲k̲l̲o̲g̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲
Backlog handling refers to actions to
- avoid dead lock
- avoid system overload
4.11.4.1 D̲e̲a̲d̲ ̲L̲o̲c̲k̲
Dead lock refers to a situation, where processes demand
a number of shared resources to perform a function.
If for instance process A and B both require a reader
and a printer, and A has reserved the reader, and B
the printer, then a deadlock situation exists, if neither
A or B relinquishes their reserved resource.
It is a design aim to prevent dead locks. However,
this will introduce an overhead. During detailed design
it will be decided whether this overhead can be tolerated
or supervisor commands shall be defined to handle dead
locks, which
- are unlikely to occur and which
- will imply a considerable overhead to avoid
4.11.4.2 O̲v̲e̲r̲l̲o̲a̲d̲
Overload will be prevented by the following general
method:
a) Issue a warning to the supervisor, when a resource
"warning threshold" is exceeded (refer figure 4.11.4.2-1).
A new warning will not be issued until the resource
consumption is below the "warning enable threshold".
b) Provide the supervisor with commands, which can
remedy the situation.
c) If the supervisor does not use his commands, then
an automatic ordered close down, where input is
inhibited, and where the system slowly dies out,
will be performed, when the "critical threshold"
is exceeded.
During start-up subsequent to the ordered close-down,
the supervisor is the first to be allowed to sign-in.
He will be provided with status describing various
resource consumptions and he can by means of the
commands in b) remove an overload situation prior
to start-up.
Resource Consumption
100%
Critical threshold
Warning threshold
Warning enable threshold
0%
FIGURE 4.11.4.2-1…01…RESOURCE THRESHOLDS
Overload situations will be described during detailed
design. At this level the following overload situations
are foreseen:
- queues
- intermediate storage
- short term storage
4.11.4.2.1 Q̲u̲e̲u̲e̲ ̲O̲v̲e̲r̲l̲o̲a̲d̲
Queues are allowed to expand on disks. However, the
thresholds in figure 4.11.4.2-1 are applied.
The supervisor will be given commands to remove the
queue overload situation.
4.11.4.2.2 I̲n̲t̲e̲r̲m̲e̲d̲i̲a̲t̲e̲ ̲S̲t̲o̲r̲a̲g̲e̲
The general overload concept is followed. The supervisor
can offload items to the off-line disk.
4.11.4.2.3 S̲h̲o̲r̲t̲ ̲T̲e̲r̲m̲ ̲S̲t̲o̲r̲a̲g̲e̲
The following short term storage resources are limited:
- maximum 250 messages in preparation, which have
not yet received release authorization exist
- maximum 250 non-delivered comments exist
- maximum 100 non-delivered notificatons of release
exist.
- a common maximum of non-delivered items. This maximum
will be reached, when incoming traffic exceeds
the delivery capacity for a longer period. The
maximum is defined at system generation.
The general overload concept is followed. The supervisor
will be given commands to delete short term storage
messages, comments and release notifications.
4.12 S̲Y̲S̲T̲E̲M̲ ̲T̲E̲S̲T̲I̲N̲G̲
In order to perform the factory test a Test Drive System
(TDS) shall be developed.
The requirements to be met by the TDS are as specified
in CPS/210/SYS/0001 section 3.5.11.5.2.
A separate TDS specification CPS/SDS/008 (Contract
line item 4.5.1, sequence number 18) will describe
the TDS capabilities and design.
System tests such as the DSMT test, functional and
operational test are all described in CAMPS Acceptance
Plan CPS/PLN/012. System testing will therefore not
be discussed further in this document.