⟦d3956216c⟧

WangText

…02…CPS/SDS/001

…02…FH/810227…02……02…
CAMPS
SYSTEM
DESIGN
SPECIFICATION
…02……02…CAMPS

                 T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲

     4.11  ERROR AND BACKLOG HANDLING ..............
     349
       4.11.1  Error Processing Mechanisms .........
       349
         4.11.1.1  Error Reception/Reporting .......
         350
         4.11.1.2  Error Display/Printout ..........
         350

       4.11.2  Error Detection and Localization ....
       350
         4.11.2.1  Error Detection/localization
                   Analysis ........................
                   351
         4.11.2.2  Errors Detection by a PU ........
         351
           4.11.2.2.1  PU Hardware Error Detection
           352
           4.11.2.2.2  PU Firmware Error Detection
           352
           4.11.2.2.3  PU Software Error Detection
           352

       4.11.3  Error Fix-up ........................
       353
         4.11.3.1  PU or IO Bus Error Fix-up .......
         354
         4.11.3.2  TDX System Error ................
         354
         4.11.3.3  Mirrored Disk System Error ......
         354
         4.11.3.4  Offline or Floppy Disk Error ....
         354
         4.11.3.5  LTU System Error ................
         355
         4.11.3.6  LTUX System Error ...............
         356
         4.11.3.7  Watchdog System Error ...........
         356
         4.11.3.8  Power Down ......................
         357
         4.11.3.9  Hardware Resource Error .........
         357
         4.11.3.10 Security or Access Control Error
         357
         4.11.3.11 Software Resource Error .........
         358
         4.11.3.12 Miscellaneous ...................
         358

       4.11.4  Backlog Handling ....................
       358
         4.11.4.1  Dead Lock .......................
         359
         4.11.4.2  Overload ........................
         359
           4.11.4.2.1  Queue Overload ..............
           360
           4.11.4.2.2  Intermediate Storage ........
           360
           4.11.4.2.3  Short Term Storage ..........
           361

4.11     E̲R̲R̲O̲R̲ ̲A̲N̲D̲ ̲B̲A̲C̲K̲L̲O̲G̲ ̲H̲A̲N̲D̲L̲I̲N̲G̲

         This section addresses the processing of technical
         errors. Technical errors are hardware errors and the
         software errors related to system software use. Errors
         due to e.g. ACP127 message format analysis or syntax
         errors in user input are not covered.

         The section is divided into four subsections, which
         describe:

         a)  The error processing mechanisms provided by the
             Kernel

         b)  Error detection facilities and localization of
             an erroneous module

         c)  Error types and corresponding error fix-up actions

         d)  Backlog handling facilities

         The section only handles the occurence of a single
         error. Multiple errors may imply a total system error,
         in which case a WARM2 start-up is to be executed. However,
         a total system error may be disastrous, if both mirrored
         disks are corrupted (due to head-landing). In this
         case a DEAD2 start-up is to be executed.

4.11.1   E̲r̲r̲o̲r̲ ̲P̲r̲o̲c̲e̲s̲s̲i̲n̲g̲ ̲M̲e̲c̲h̲a̲n̲i̲s̲m̲s̲

         The Kernel contains a table, which defines an error
         to error type relation. It is possible for a process
         to specify error types for which it will take over
         the error handling. The take over is implemented via
         an application process defined procedure, which is
         automatically invoked, if an error of the specified
         type occurs. The application error fix-up process has
         access to extended error information, which in detail
         defines the error (e.g. to hardware module level).

         It is, however, possible for the parent of the process
         to inhibit a child from specifying certain error types
         (e.g. security error).

         Errors not handled by a process are given to the parent
         of the process and the process is stopped.

4.11.1.1 E̲r̲r̲o̲r̲ ̲R̲e̲c̲e̲p̲t̲i̲o̲n̲/̲R̲e̲p̲o̲r̲t̲i̲n̲g̲

         COPSY receives error reports subsequent to the detection
         of an error from:

         -   On line diagnostics programs

         -   Application software, which has not specified an
             error fix-up procedure

         -   PU firmware detected hardware errors

         -   Kernel detected software errors

         -   The watchdog having monitored a hardware error

         Application processes receive error reports subsequent
         to an error from the system software which on behalf
         of the application operates on lines, files, queues,
         areas, etc.:

         -   I/O system

         -   Message Monitor

         -   Queue Monitor

4.11.1.2 E̲r̲r̲o̲r̲ ̲D̲i̲s̲p̲l̲a̲y̲/̲P̲r̲i̲n̲t̲o̲u̲t̲

         A process, which handles an error locally, reports
         the result of the error fix-up to the SSC. The SSC
         prints the report at the operator printer and if appropriate
         updates the system status display at the operator VDU.
         If the watchdog fails then error reports are directed
         to the supervisor report printer.

4.11.2   E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲

         This section is divided into two paragraphs, which
         contains:

         a)  The principles for an analysis, which will be accomplished
             during detailed design, of error detection/localization
             facilities provided by CAMPS to meet:

             -   Requirements to integrity of operation

             -   Error reporting requirements derived from the
                 MTTR requirements

         b)  A description of specific PU error detection facilities,
             directly required in the CAMPS requirement specification.

4.11.2.1 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲/̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲ ̲A̲n̲a̲l̲y̲s̲i̲s̲

         The CAMPS system will be broken down into hardware
         and software subsystems. For each subsystem it will
         be described how the subsystem reacts as a result of
         an internal error. It will also be described how the
         error is detected due to:

         -   Hardware traps (e.g. parity check)

         -   Firmware traps (e.g. LTU and TDX system protocols)

         -   Software traps (e.g. online diagnostics and validity
             checks)

         -   Manual observation (e.g. some VDU errors, LED indication)

         -   Watchdog monitoring (e.g. power down)

         Error detection is either direct (e.g. parity check
         during memory access) or indirect (e.g. a CPU calculation
         error may be detected via an illegal memory access).
         Also, an error can be caused by a number of modules
         or by either software or hardware. It is the objective
         of the error isolation facilities to isolate an error
         to one of the groups defined in section 4.11.3.

4.11.2.2 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲b̲y̲ ̲a̲ ̲P̲U̲

         This section describes PU error detection facilities
         specifically required in the CAMPS requirement specification.

4.11.2.2.1   P̲U̲ ̲H̲a̲r̲d̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         a)  Trapping of unassigned instructions. Execution
             of any illegal code or bit pattern is detected
             and results in an invokation of the Kernel.

         b)  Instructions are separated into two classes, one
             for privileged use and one for application use.

4.11.2.2.2   P̲U̲ ̲F̲i̲r̲m̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         The MAP module prevents programs from being able to
         write in memory occupied by the operating system or
         by other programs.

4.11.2.2.3   P̲U̲ ̲S̲o̲f̲t̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         a)  At system start-up all programs and data files
             loaded into memory will carry block parity check
             sum to allow the detection of converted data.

         b)  On-line diagnostics programs operating as low priority
             tasks. A program to check sum the read-only part
             of the system software exists. It is executed periodically
             and on request from the supervisor.

         c)  High level external line protocols (e.g. continuity
             and self-addressed service messages).

         d)  Parameter validity check, when receiving data.

         e)  Security and access control as specified in section
             4.9.

4.11.3   E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲

         Errors are divided into groups, which define a type
         of errors to which a common error fix-up action exists.
         The following hardware error types are foreseen:

         -   PU or IO BUS error

         -   TDX-BUS system (TDX-BUS + STI + TDX-CTR) error

         -   Mirrored disk system (DISK-CTR + DISK DRIVE + VOLUME)

         -   Off-line or floppy disk system

         -   LTU system (LTU + external line) error

         -   LTUX system (LTUX + BSM-X + terminal equipment)
             error

         -   Watchdog or operator VDU or operator printer error

         -   Resource error (e.g. paper out)

         -   Power down

         All error types except for resource errors and errors
         related to the execusion of an instruction are non-recoverable.
         This is due to the fact that the File Management System
         (FMS) and Terminal Handling System (THS) give a high
         level interface to peripherals e.g.

         -   performs repetition of operation (handles intermittent
             errors).

         -   allocates a new disk sector, when a sector is bad.

         Instructions, which imply an error interrupt, will
         be repeated several times, before being considered
         erroneous

         The following software error types are foreseen:

         -   Security or access error

         -   Resource error

         -   Miscellaneous

4.11.3.1 P̲U̲ ̲o̲r̲ ̲I̲O̲ ̲B̲U̲S̲ ̲E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲

         A PU or IO BUS error will imply switch over to the
         stand by PU as described in section 4.3.1.4.

         If the stand by PU is unavailable the active PU is
         disabled and a WARM2 start-up of the off-line PU can
         be performed.

4.11.3.2 T̲D̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         A TDX system error is handled by the Terminal Handling
         System (THS) in the IOC and by the SSC in common. The
         SSC switches LTUXs (via the watchdog) to the appropriate
         TDX-BUS and updates the Configuration table. The THS
         performs a TDX-system switch over transparent to the
         application (i.e. TEP and THP).

         If the stand by TDX-BUS is used for off-line operation
         a total system error exists, and the PU is disabled.
         After insertion of a TDX BUS a WARM2 start-up can be
         performed.

4.11.3.3 M̲i̲r̲r̲o̲r̲e̲d̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in one of the mirrored disks is handled by
         the FMS in the IOC. The FMS switches to the stand by
         disk system transparent to the users. The SSC is notified
         and updates the Configuration Table.

         If both mirrored disks fail, a total system error exists
         and the PUs will be disabled. A WARM2 start-up can
         be executed, when the disk system is repaired.

4.11.3.4 O̲f̲f̲-̲L̲i̲n̲e̲ ̲o̲r̲ ̲F̲l̲o̲p̲p̲y̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in the off-line or the floppy disk system
         implies a removal of the disk system in question. The
         packages involved in an error fix-up are:

         a)  SSC and TEP during the following operations:

             -   back-up of system parameter file

             -   off-loading of messages

             -   trace information storage

             -   copying of modified software to off-line disk

         b)  SSC and TEP and SAR during:

             -   retrieval of off-loaded messages

         c)  SSC during:

             -   load of start-up data

             -   copying of modified software to the on-line
                 disks

             -   memory dump

         d)  SSP and OLP during:

             -   off-line operations

             During on-line operation, the SSC de-assigns devices
             and dismounts the mirrored and the floppy disk
             volumes, whereas the TEP dismounts the off-line
             disk. Also, the SSC updates the Configuration table.

4.11.3.5 L̲T̲U̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An LTU line error is handled by the THP and the SSC:
         The THP closes the line activities, whereas the SSC
         deletes the THP instance, which handles the line. Also,
         the SSC updates the Configuration table.

         An LTU error affects up to 2 lines. Per line the error
         fix-up is as described above.

4.11.3.6 L̲T̲U̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in the terminal equipment is handled by the
         TEP and the SSC. The TEP cancels the ongoing transaction(s),
         whereas the SSC deletes the TEP instance and updates
         the Configuration Table.

         An error in a TRC/point-to-point line is handled by
         the THP and the SSC. The THP closes the line activities,
         whereas the SSC deletes the THP instance and updates
         the Configuration table.

         An LTUX error involves the TEP or THP instances using
         the LTUX. Per instance the fix-up is as described above.

         A BSM error involves the TEP or THP instances using
         the two LTUXs, which are controlled by the BSM. Per
         terminal instance the fix-up is as described above.

4.11.3.7 W̲a̲t̲c̲h̲d̲o̲g̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         If the operator VDU fails, then watchdog operation
         not involving the VDU and printer continues.

         If the operator printer fails, then the SSC directs
         active PU error messages to the supervisor report printer.
         The non-active PU will not be able to perform print-out.
         The watchdog continues operations not involving the
         printer and the VDU.

         If the Watchdog Processor WDP fails, then the SSC directs
         active PU error messages to the supervisor report printer.
         If a reconfiguration, which involves the WDP, has to
         take place, then a total system error exists.

4.11.3.8 P̲o̲w̲e̲r̲ ̲D̲o̲w̲n̲

         If a power down is detected in the active PU a switch-over
         to the stand by PU is automatically executed.

         If a power down is detected in the non-active PU, then
         the PU is disabled.

         The IO-crate contains two power supplies:

         -   one supply per IO BUS

         -   dualized supply per IO BUS device

         A single power down has effects identical to a PU power
         down.

         A power down in a TDX crate implies that the two LTUXs
         in the crate are taken out of service.

         A power down in the watchdog implies that the watchdog
         is taken out of service. Refer to section 4.11.3.7.

4.11.3.9 H̲a̲r̲d̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲

         Hardware resource errors include:

         -   Paper out

         -   No more file space

         This error type may be handled by an application. Further
         discussion is deferred to detailed design.

4.11.3.10    S̲e̲c̲u̲r̲i̲t̲y̲ ̲o̲r̲ ̲A̲c̲c̲e̲s̲s̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲E̲r̲r̲o̲r̲

         These errors are due to a programming error.

         Reactions subsequent to the detection of an error during
         security- or access control is defined in section 4.9.

4.11.3.11    S̲o̲f̲t̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲

         Software resource errors include:

         -   Queue full

         -   No more file control blocks (FCBs)

         This type of error may be handled by an application,
         which for example may:

         -   wait until resource is free

         -   wait specified period for resource to become free


         -   stop input

         Refer to section 4.11.4, where some overload situations
         are handled.

4.11.3.12    M̲i̲s̲c̲e̲l̲l̲a̲n̲e̲o̲u̲s̲

         This type of errors include:

         -   Semantic error in input parameter

         -   Time out during process communication

         -   Backlog (refer section 4.11.4)

         These errors are due to a programming error. Error
         reactions are provided during detailed design.

4.11.4   B̲a̲c̲k̲l̲o̲g̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲

         Backlog handling refers to actions to

         -   avoid dead lock

         -   avoid system overload

4.11.4.1 D̲e̲a̲d̲ ̲L̲o̲c̲k̲

         Dead lock refers to a situation, where processes demand
         a number of shared resources to perform a function.
         If for instance process A and B both require a reader
         and a printer, and A has reserved the reader, and B
         the printer, then a deadlock situation exists, if neither
         A or B relinquishes their reserved resource.

         It is a design aim to prevent dead locks. However,
         this will introduce an overhead. During detailed design
         it will be decided whether this overhead can be tolerated
         or supervisor commands shall be defined to handle dead
         locks, which

         -   are unlikely to occur and which

         -   will imply a considerable overhead to avoid

4.11.4.2 O̲v̲e̲r̲l̲o̲a̲d̲

         Overload will be prevented by the following general
         method:

         a)  Issue a warning to the supervisor, when a resource
             "warning threshold" is exceeded (refer figure 4.11.4.2-1).
             A new warning will not be issued until the resource
             consumption is below the "warning enable threshold".

         b)  Provide the supervisor with commands, which can
             remedy the situation.

         c)  If the supervisor does not use his commands, then
             an automatic ordered close down, where input is
             inhibited, and where the system slowly dies out,
             will be performed, when the "critical threshold"
             is exceeded.

             During start-up subsequent to the ordered close-down,
             the supervisor is the first to be allowed to sign-in.
             He will be provided with status describing various
             resource consumptions and he can by means of the
             commands in b) remove an overload situation prior
             to start-up.

             Resource Consumption

                     100%

                     Critical threshold

                     Warning threshold

                     Warning enable threshold

                     0%

          FIGURE 4.11.4.2-1…01…RESOURCE THRESHOLDS

         Overload situations will be described during detailed
         design. At this level the following overload situations
         are foreseen:

         -   queues

         -   intermediate storage

         -   short term storage

4.11.4.2.1   Q̲u̲e̲u̲e̲ ̲O̲v̲e̲r̲l̲o̲a̲d̲

         Queues are allowed to expand on disks. However, the
         thresholds in figure 4.11.4.2-1 are applied.

         The supervisor will be given commands to remove the
         queue overload situation.

4.11.4.2.2   I̲n̲t̲e̲r̲m̲e̲d̲i̲a̲t̲e̲ ̲S̲t̲o̲r̲a̲g̲e̲

         The general overload concept is followed. The supervisor
         can offload items to the off-line disk.

4.11.4.2.3   S̲h̲o̲r̲t̲ ̲T̲e̲r̲m̲ ̲S̲t̲o̲r̲a̲g̲e̲

         The following short term storage resources are limited:

         -   maximum 250 messages in preparation, which have
             not yet received release authorization exist

         -   maximum 250 non-delivered comments exist

         -   maximum 100 non-delivered notificatons of release
             exist.

         -   a common maximum of non-delivered items. This maximum
             will be reached, when incoming traffic exceeds
             the delivery capacity for a longer period. The
             maximum is defined at system generation.

         The general overload concept is followed. The supervisor
         will be given commands to delete short term storage
         messages, comments and release notifications.

4.12     S̲Y̲S̲T̲E̲M̲ ̲T̲E̲S̲T̲I̲N̲G̲

         In order to perform the factory test a Test Drive System
         (TDS) shall be developed.

         The requirements to be met by the TDS are as specified
         in CPS/210/SYS/0001 section 3.5.11.5.2.

         A separate TDS specification CPS/SDS/008 (Contract
         line item 4.5.1, sequence number 18) will describe
         the TDS capabilities and design.

         System tests such as the DSMT test, functional and
         operational test are all described in CAMPS Acceptance
         Plan CPS/PLN/012. System testing will therefore not
         be discussed further in this document.

DataMuseum.dk

CR80 Wang WCS documentation floppies

⟦d3956216c⟧ Wang Wps File

Derivation

WangText