⟦a906f5108⟧

WangText

…00……00……00……00……00……1a……0a……00……86…1 …02… …02… …02…

…02…CPS/SDS/001

…02… FH/810115…02……02…
CAMPS SYSTEM DESIGN SPECIFICATION
…02……02…CAMPS

                 T̲A̲B̲L̲E̲ ̲O̲F̲ ̲C̲O̲N̲T̲E̲N̲T̲S̲

     4.11  ERROR AND BACKLOCK HANDLING .............

       4.11.1  Error Processing Mechanisms .........

         4.11.1.1  Error Reception/reporting .......

         4.11.1.2  Error Display/Printout ..........


       4.11.2  Error Detection and Localization ....

         4.11.2.1  Error Detection/localization
                   Analysis ........................

         4.11.2.2  Errors Detected by a PU .........

           4.11.2.2.1  PU Hardware Errordetection ..

           4.11.2.2.2  PU Firmware Errordetection ..

           4.11.2.2.3  PU Software Errordetection ..


       4.11.3  Error Fix-up ........................

         4.11.3.1  PU or IO Bus Error Fix-up .......

         4.11.3.2  TDX System Error ................

         4.11.3.4  Offline or Floppy Disk Error ....

         4.11.3.5  LTU System Error ................

         4.11.3.6  LTUX System Error ...............

         4.11.3.7  Watchdog System Error ...........

         4.11.3.8  Power Down ......................

         4.11.3.9  Hardware Resource Error .........

         4.11.3.10 Security or Access Control Error

         4.11.3.11 Software Resource Error .........

         4.11.3.12 Miscellaneous ...................


       4.11.4  Backlock Handling ...................

         4.11.4.1  Deadlock ........................

         4.11.4.2  Overload ........................

           4.11.4.2.1  Queue Overload ..............

           4.11.4.2.2  Intermediate Storage ........

           4.11.4.2.3  Short Term Storage ..........

4.11     E̲R̲R̲O̲R̲ ̲A̲N̲D̲ ̲B̲A̲C̲K̲L̲O̲C̲K̲ ̲H̲A̲N̲D̲L̲I̲N̲G̲

         This section addresses the processing of technical
         errors. Technical errors are hardware errors and the
         software errors related to system software use. Errors
         due to e.g. ACP127 message format analysis or syntax
         errors in user input are not handled.

         The section is divided into four subsections, which
         describe:

         a)  The error processing mechanisms provided by the
             Kernel

         b)  Error detection facilities and localization of
             an erroneous module

         c)  Errortypes and corresponding error fix-up actions

         d)  Backlock handling facilities

         The section only handles the occurence of a single
         error. Multiple errors may imply a total system error,
         in which case a WARM2 start-up is to be executed. However,
         a total system error may be disastrous, if both invoked
         disks are corrupted (due to head-landing). In this
         case a DEAD2 start-up is to be executed.

4.11.1   E̲r̲r̲o̲r̲ ̲P̲r̲o̲c̲e̲s̲s̲i̲n̲g̲ ̲M̲e̲c̲h̲a̲n̲i̲s̲m̲s̲

         The Kernel contains a table, which defines an error
         to error type relation. It is possible for a process
         to specify error types for which it will take over
         the error handling. The take over is implemented via
         an application process defined procedure, which is
         automatically invoked, of an error of the specified
         type occurs. The application error fix-up process has
         access to an extended error information, which in detail
         defines the error (e.g. to hardware module level).

         It is, however, possible for the parent of the process
         to inhibit a child from specifying certain error types
         (e.g. security error).

         Errors not handled by a process are given to the parent
         of the process and the process is stopped.

4.11.1.1 E̲r̲r̲o̲r̲ ̲R̲e̲c̲e̲p̲t̲i̲o̲n̲/̲R̲e̲p̲o̲r̲t̲i̲n̲g̲

         COPSY receives error reports subsequent to the detection
         of an error from:

         -   On line diagnostics programs

         -   Application software, which has not specified an
             error fix-up procedure

         -   PU firmware detected hardware errors

         -   Kernel detected software errors

         -   The watchdog having monitored a hardware error

         Application processes receive error reports subsequent
         to an error from the system software which on behalf
         of the application operates on lines, files, queues,
         areas, etc.:

         -   I/O system

         -   Message Monitor

         -   Queue Monitor

4.11.1.2 E̲r̲r̲o̲r̲ ̲D̲i̲s̲p̲l̲a̲y̲/̲P̲r̲i̲n̲t̲o̲u̲t̲

         A process, which handles an error locally, reports
         the result of the error fix-up to the SSC. The SSC
         prints the report at the operator printer and if appropriate
         updates the system status display at the operator VDU.
         If the watchdog fails then error reports are directed
         to the supervisor report printer.

4.11.2   E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲

         This section is divided into two subsections, which
         contains:

         a)  The principles for an analysis, which will be accomplished
             during detailed design, of error detection/localization
             facilities provided by CAMPS to meet:

             -   Requirements to integrity of operation

             -   Error reporting requirements derived from the
                 MTTR requirements

         b)  A description of specific PU error detection facilities,
             directly required in the SRS.

4.11.2.1 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲/̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲ ̲A̲n̲a̲l̲y̲s̲i̲s̲

         The CAMPS system will be broken down into hardware
         and software subsystems. For each subsystem it will
         be described how the subsystem reacts as a result of
         an internal error. It will also be described how the
         error is detected due to:

         -   Hardware traps (e.g. parity check)

         -   Firmware traps (e.g. LTU and TDX system protocols)

         -   Software traps (e.g. online diagnostics and validity
             checks)

         -   Manual observation (e.g. some VDU errors, LED indication)

         -   Watchdog monitoring (e.g. power down)

         Error detection is either direct (e.g. parity check
         during memory access) or indirect (e.g. a CPU calculation
         error may be detected via an illegal memory access).
         Also, an error can be caused by a number of modules
         or by either software or hardware. It is the objective
         of the error isolation facilities to isolate an error
         to one of the groups defined in section 4.11.3.

4.11.2.2 E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲b̲y̲ ̲a̲ ̲P̲U̲

         This section describes PU error detection facilities
         specifically required in the SRS.

4.11.2.2.1   P̲U̲ ̲H̲a̲r̲d̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         a)  Trapping of unassigned instructions. Execution
             of any illegal code or bit pattern is detected
             and results in an invokation of the Kernel.

         b)  The MAP module prevents programs from being able
             to write in memory occupied by the operating system
             or by other programs.

         c)  Instructions are separated into two classes, one
             for privileged use and one for application use.

4.11.2.2.2   P̲U̲ ̲F̲i̲r̲m̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         The CPUs and the MAP contains a Built-in testprogram
         (BITE).

4.11.2.2.3   P̲U̲ ̲S̲o̲f̲t̲w̲a̲r̲e̲ ̲E̲r̲r̲o̲r̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲

         a)  At system start-up all programs and data files
             loaded into memory will carry block parity check
             sum to allow the detection of converted data.

         b)  The memory resident read only part of the system
             software will be checked (via check sum) on the
             supervisor's request and periodically.

         c)  High level external line protocols (e.g. continuity
             and self-addressed service messages).

         d)  Parameter validity check, when receiving data.

         e)  Security and access control as specified in section
             4.9.

         f)  On-line diagnostics programs. The programs execute
             as low priority tasks or based upon build in tests.
             The on-line diagnostics programs will detect an
             error either prior to the error having had any
             effect or subsequent to. In the latter case the
             on-line diagnostics programs may prevent (if the
             error is not detected by other means) an avalanche
             of corrupted messages or transactions.

4.11.3   E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲

         Errors are divided into groups, which define a type
         of errors to which a common error fix-up action exists.
         The following hardware error types are foreseen:

         -   PU or IO BUS error

         -   TDX-BUS system (TDX-BUS + TDX-I/F + TDX-CTR) error

         -   Mirrored disk system (DISK-CTR + DISK DRIVE + VOLUME)

         -   Off-line or floppy disk system

         -   LTU system (LTU + external line) error

         -   LTUX system (LTUX + BSM-X + terminal equipment)
             error

         -   Watchdog or operator VDU or operator printer error

         -   Resource error (e.g. paper out)

         -   Power down

         All error types except for resource errors are non-recoverable.
         This is due to the fact that the FMS and THS give a
         high level interface to peripherals e.g.

         -   performs repetition of operation (handles intermittent
             errors)

         -   allocates a new disk sector, when a sector is bad

         The following software error types are foreseen:

         -   Security or access error

         -   Resource error

         -   Miscellaneous

4.11.3.1 P̲U̲ ̲o̲r̲ ̲I̲O̲ ̲B̲U̲S̲ ̲E̲r̲r̲o̲r̲ ̲F̲i̲x̲-̲U̲p̲

         A PU or IO BUS error will imply an ordered or an emergency
         switch over to the stand by PU as described in section
         4.3.1.4.

         If the stand by PU is unavailable the active PU is
         disabled and a WARM2 start-up of the off-line PU can
         be performed.

4.11.3.2 T̲D̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         A TDX system error is handled by the Terminal Handling
         System (THS) in the IOC and by the SSC in common. The
         SSC switches LTUXs (vis the watchdog) to the appropriate
         TDX-BUS and updates the Configuration table. The THS
         performs a TDX-system switch over transparent to the
         application (i.e. TEP and THP).

         If the stand by TDX-BUS is used for off-line operation
         a total system error exists, and the PU is disabled.
         After insertion of a TDX BUS a WARM2 start-up can be
         performed.

4.11.3.3 M̲i̲r̲r̲o̲r̲e̲d̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in one of the mirrored disks is handled by
         the FMS in the IOC. The FMS switches to the stand by
         disk system transparent to the users. The SSC is notified
         and updates the Configuration Table.

         If both mirrored disks fail, a total system error exists
         and the PUs will be disabled. A WARM2 start-up can
         be executed, when the disk system is repaired.

4.11.3.4 O̲f̲f̲-̲L̲i̲n̲e̲ ̲o̲r̲ ̲F̲l̲o̲p̲p̲y̲ ̲D̲i̲s̲k̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in the off-line or the floppy disk system
         implies a removal of the disk system in question. The
         packages involved in an error fix-up are:

         a)  SSC and TEP during the following operations:

             -   back-up of system parameter file

             -   off-loading of messages

             -   trace information storage

             -   copying of modified software to off-line disk

         b)  SSC and TEP and SAR during:

             -   retrieval of off-loaded messages

         c)  SSC during:

             -   load of start-up data

             -   copying of modified software to the on-line
                 disks

             -   memory dump

         d)  SSP during:

             -   off-line operations

             During on-line operation, the SSC de-assigns devices
             and dismounts the mirrored and the floppy disk
             volumes, whereas the TEP dismounts the off-line
             disk. Also, the SSC updates the Configuration table.

4.11.3.5 L̲T̲U̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An LTU line error is handled by the THP and the SSC:
         The THP closes the line activities, whereas the SSC
         deletes the THP instance, which handles the line. Also,
         the SSC updates the Configuration table.

         An LTU error affects up till 2 lines. Per line the
         error fix-up is as described above.

4.11.3.6 L̲T̲U̲X̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         An error in the terminal equipment is handled by the
         TEP and the SSC. The TEP cancels the ongoing transaction(s),
         whereas the SSC deletes the TEP instance and updates
         the Configuration Table.

         An error in a TRC line is handled by the THP and the
         SSC. The THP closes the line activities, whereas the
         SSC deletes the THP instance and updates the Configuration
         table.

         A spare LTUX exists to enable a hardware patch of line
         equipment. The SSC controls the switching of assignment
         of LTUX lines to the spare LTUX. The control is initiated
         by an operator command. After a switching of terminal
         equipment users have to sign-in.

         An LTUX error involves the TEP or THP instances using
         the LTUX. Per instance the fix-up is as described above.

         A BSM error involves the TEP or THP instances using
         the two LTUXs, which are controlled by the BSM. Per
         terminal instance the fix-up is as described above.

4.4.3.7  W̲a̲t̲c̲h̲d̲o̲g̲ ̲S̲y̲s̲t̲e̲m̲ ̲E̲r̲r̲o̲r̲

         If the watchdog fails, the operator VDU and printer
         can be directly connected to one of the PUs for continuing
         operator control.

         When an active PU has no operator position, then the
         SSC directs error messages to the supervisor printer.

         If the operator VDU fails, then watchdog operation
         not involving the VDU continues.

         If the operator printer fails, then the SSC directs
         active PU error messages to the supervisor printer.
         The non-active PU will not be able to perform print-out.
         The watchdog continues operations not involving the
         printer.

4.11.3.8 P̲o̲w̲e̲r̲ ̲D̲o̲w̲n̲

         If a power down is detected in the active PU an emergency
         switch-over to the stand by PU is automatically executed.

         If a power down is detected in the non-active PU, then
         the PU is disabled.

         The IO-crate contains two power supplies:

         -   one supply per IO BUS

         -   dualized supply per IO BUS device

         A single power down has effects identical to a PU power
         down.

         A power down in a TDX crate implies that the two LTUXs
         in the crate is taken out of service.

         A power down in the watchdog implies that the watchdog
         is taken out of service. Refer to section 4.11.3.7.

4.4.3.9  H̲a̲r̲d̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲

         Hardware resource errors include:

         -   Paper out

         -   No more file space

         This error type may be handled by an application. Further
         discussion is deferred to detailed design.

4.11.3.10    S̲e̲c̲u̲r̲i̲t̲y̲ ̲o̲r̲ ̲A̲c̲c̲e̲s̲s̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲E̲r̲r̲o̲r̲

         Reactions subsequent to the detection of an error during
         security- or access control is defined in section 4.9.

4.11.3.11    S̲o̲f̲t̲w̲a̲r̲e̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲E̲r̲r̲o̲r̲

         Software resource errors include:

         -   Queue full

         -   No more file control blocks (FCBs)

         This type of error may be handled by an application,
         which for example may:

         -   wait until resource is free

         -   wait specified period for resource to become free


         -   stop input

         Further details are deferred to detailed design. Refer
         to section 4.11.4, where some overload situations are
         handled.

4.11.3.12    M̲i̲s̲c̲e̲l̲l̲a̲n̲e̲o̲u̲s̲

         This type of errors include:

         -   Semantic error in input parameter

         -   Time out during process communication

         -   Back lock (refer section 4.11.4)

         These errors are due to a programming error. Further
         details are provided during detailed design.

4.11.4   B̲a̲c̲k̲ ̲L̲o̲c̲k̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲

         Back lock handling refers to actions to

         -   avoid system overload

         -   avoid dead lock

4.11.4.1 D̲e̲a̲d̲ ̲L̲o̲c̲k̲

         Dead lock refers to a situation, where processes demand
         a number of shared resources to perform a function.
         If for instance process A and B both require a reader
         and a printer, and A has reserved the reader, and B
         the printer, then a deadlock situation exists, is neither
         A or B relinquishes their reserved resource.

         It is a design aim to prevent dead locks. However,
         this will introduce an overhead. During detailed design
         it will be decided whether this overhead can be tolerated
         or supervisor commands shall be defined to handle dead
         locks, which

         -   are unlikely to occur and which

         -   it will imply a considerable overhead to avoid

4.11.4.2 O̲v̲e̲r̲l̲o̲a̲d̲

         Overload will be prevented by the following general
         method:

         a)  Issue a warning to the supervisor, when a resource
             "warning threshold" is exceeded (refer figure 4.11.4.2-1).
             A new warning will not be issued until the resource
             consumption has been below the "warning enable
             threshold".

         b)  Provide the supervisor with commands, which can
             remedy the situation.

         c)  If the supervisor does not use his commands, then
             an automatic ordered close down, where input is
             inhibited, and where the system slowly dies out,
             will be performed, when the "critical threshold"
             is exceeded.

             During start-up subsequent to the ordered close-down,
             the supervisor is the first to be allowed to sign-in.
             He will be provided with status describing various
             resource consumptions and he can by means of the
             commands in b) remove an overload situation prior
             to start-up.

             Resource Consumption

                     100%

                     Critical threshold

                     Warning threshold

                     Warning enable threshold

                     0%

          FIGURE 4.11.4.2-1…01…RESOURCE THRESHOLDS

         Overload situations will be described during detailed
         design. At this level the following overload situations
         are foreseen:

         -   queues

         -   intermediate storage

         -   short term storage

4.11.4.2.1   Q̲u̲e̲u̲e̲ ̲O̲v̲e̲r̲l̲o̲a̲d̲

         Queues are allowed to expand on disks. However, the
         thresholds in figure 4.11.4.2-1 are applied. Also a
         warning is provided, when a queue element has resided
         in a queue longer than a specified time.

         The supervisor will be given commands to move queues
         to the MDCO.

4.11.4.2.2   I̲n̲t̲e̲r̲m̲e̲d̲i̲a̲t̲e̲ ̲S̲t̲o̲r̲a̲g̲e̲

         The general overload concept is followed. The supervisor
         can offload messages/areas to the off-line disk.

4.11.4.2.3   S̲h̲o̲r̲t̲ ̲T̲e̲r̲m̲ ̲S̲t̲o̲r̲a̲g̲e̲

         The general overload concept is followed. Also, a warning
         is provided when a message has resided in the short
         term storage longer than a specified period.

         The supervisor will be given commands to delete old
         messages.

DataMuseum.dk

CR80 Wang WCS documentation floppies

⟦a906f5108⟧ Wang Wps File

Derivation

WangText