top - download
⟦283830870⟧ Wang Wps File
Length: 66326 (0x10316)
Types: Wang Wps File
Notes: R&M PROGRAM PLAN
Names: »0132A «
Derivation
└─⟦303071de8⟧ Bits:30006071 8" Wang WCS floppy, CR 0027A
└─ ⟦this⟧ »0132A «
WangText
D…00……00……00……00…;…0a……00……00…;…0b…;…02…:…0b…:…0d…:
:…05…9…0c…9…02…8…08…8…0e…8…05…7…0b…7…02…6…00…6…06…6…07…3…0c…3…02…2…08…2…0d…2
2…06…2…07…1…0d…1…01…1…07…0…00…/…08…/…0e…-…00…-…06……14…
…10……01……10……06……0f……0c……86…1 …02… …02… …02…
…02…CPS/PLN/004
810527
R&M PROGRAM PLAN
…02…ISSUE 2…02…CAMPS
TABLE OF CONTENTS
1 SCOPE ........................................
4 to 5
1.1 APPLICABLE DOCUMENTS .....................
4
1.2 REFERENCED DOCUMENTS .....................
5
2 TERMS AND DEFINITIONS ........................
6 to 8
2.1 ACRONYMS ..................................
8
3 PLANNING SUMMARY .............................
9 to 15
3.1 R&M PROGRAM TASKS ........................
10
3.1.1 R&M Analysis ..........................
11
3.1.2 RMA Reports ...........................
13
3.1.3 Failure Reporting and Control .........
13
3.1.4 Failure Analysis ......................
14
3.1.5 Monitoring of Software Defects ........
14
3.2 ORGANIZATION .............................
14
3.3 SCHEDULES ................................
14
4 CAMPS R&M ANALYSIS ...........................
16 to 81
4.1 GENERAL ..................................
16
4.1.1 Failure Definition ...................
16
4.1.2 Equipment Functions ..................
17
4.1.3 Failure Independence .................
17
4.1.4 Failure Detection and Localization ...
18
4.1.5 Module Replacement ...................
18
4.1.5.1 Processor Assembly ................
19
4.1.5.2 Channel Assembly ..................
19
4.1.5.3 TDX Assembly ......................
19
4.1.5.4 Watchdog Processor Assembly .......
20
4.1.6 Applied Redundancy ....................
20
4.1.7 Stress Parameters .....................
23
4.1.8 CAMPS Modes of Operation ..............
23
4.1.9 Equipment Meantime Between Failure
(MTBF) .................................
23
4.1.10 Equipment Maintainability (MTTR) .....
26
4.1.11 Predicted Availability ...............
28
4.1.11.1 System Status and Control,
Matematical Model ................
30
4.2 R&M MODELING ..............................
31
4.2.1 Availability and Reliability Require-
ments .................................
32
4.2.1.1 Individual User Connecting Points .
32
4.2.1.2 Groups of User Connecting Points ..
32
4.2.1.3 User Connecting Points to Super-
visory and Service Terminals ......
33
4.2.1.4 External Channels and Circuits ....
33
4.2.1.5 Individual Channels ...............
33
4.2.1.6 Groups of Channels ................
34
4.2.1.7 All Circuits, Channels and User
Connecting Points .................
34
4.2.2 R&M Models ............................
35
4.2.2.1 Model 1: Central Processor Assy ...
37
4.2.2.2 Model 2: Channel Unit Assy and
Storage ...........................
37
4.2.2.3 Model 3: TDX Bus System ...........
37
4.2.2.4 Model 3B: "LTUX-Chain" ............
37
4.2.2.5 Model 4A: Individual User Connec-
ting Points .......................
37
4.2.2.6 Model 4B: Individual External
Channels ..........................
37
4.2.2.7 Model 5A: Service Common to Groups
of User Connecting Points .........
38
4.2.2.8 Model 5B: Service Common to Groups
of External Channels ..............
38
4.2.2.9 Model 6A: Service Common to 75% of
User Connecting Points ............
38
4.2.2.10 Model 7: Service to All Circuits,
Channels, and User Connecting
Points ............................
38
4.2.2.11 Model 8: SS&C System ..............
38
4.2.2.12 General Comments to Tables 4.2.2.-1
to 4.2.2-12 .......................
38
4.3 DESIGN CHANGES ............................
81
4.4 R&M TESTING ...............................
81
5 RMA REPORTS ...................................
82
6 FAILURE REPORTING AND CONTROL .................
83
7 FAILURE ANALYSIS ..............................
84
8 MONITORING OF SOFTWARE DEFECTS ................
85
1̲ ̲ ̲S̲C̲O̲P̲E̲
This document describes the activities which have to
be instituted and reviewed throughout the implementation
of the CAMPS program.
The objectives of the reliability program for the CAMPS
system are:
1) To ensure compliance with the reliability requirement
as stated in CPS/210/SYS/0001 sec. 3.4.4.
2) To assist design configuration selection decisions.
3) To give early identification of potential reliability
problem areas and guidelines to their solution.
4) To provide the basis for the availability and maintainability
verification.
The R&M Program Plan provides a planning summary with
a description of all the R&M Tasks in form of Work
Packages and task schedules.
In addition, the plan provides an R&M model for each
of the reliability requirements specified in the System
Requirements Specification. The models are updates
to the models presented in the CAMPS proposals.
The models use predicted values of MTBF and MTTR for
the CAMPS hardware modules, where a module is the lowest
replaceable unit at site level.
1.1 A̲P̲P̲L̲I̲C̲A̲B̲L̲E̲ ̲D̲O̲C̲U̲M̲E̲N̲T̲S̲
The cited documents are applicable only to the extent
specified in the text of the plan.
1) CPS/SDS/001, CAMPS SYSTEM DESIGN SPECIFICATION,
Preliminary Issue, 810115
2) CPS/210/SYS/0001, CAMPS SYSTEM REQUIREMENTS,
Issue 2, 801124.
1.2 R̲E̲F̲E̲R̲E̲N̲C̲E̲D̲ ̲D̲O̲C̲U̲M̲E̲N̲T̲S̲
MIL-HDBK-217C Reliability Design Handbook
MIL-HDBK-1521A (USAF) Technical Reviews and Audits
for Systems, Equipments, and Computer
Programs
MIL-STD-781B Reliability Tests: Exponential Distribution
MIL-HDBK-472 Maintainability Prediction
CPS/PLN/006 Maintenance Plan for CAMPS
MIL-STD-833
Level B Test Methods and Procedures for Microelectronics
2̲ ̲ ̲T̲E̲R̲M̲S̲ ̲A̲N̲D̲ ̲D̲E̲F̲I̲N̲T̲I̲O̲N̲S̲
In interpreting specification and verification sections
in this plan on reliability and availability the following
terms shall apply:
a) A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲. The probability of finding an item
(system, module, unit and part thereof) in a functioning
condition at a given time.
b) C̲o̲r̲r̲e̲c̲t̲i̲v̲e̲ ̲M̲a̲i̲n̲t̲e̲n̲a̲n̲c̲e̲. The maintenance undertaken
to restore an item to a specified condition after
a failure has occurred.
c) D̲o̲w̲n̲ ̲T̲i̲m̲e̲. The time during which any of the facilities
or functions to be provided by the item is not
available, for whatever reason.
d) F̲a̲i̲l̲u̲r̲e̲. The inability of any item to carry out
its specified function within the tolerance allowed
under its normal operating conditions.
The following failure situations shall be disregarded
in availability calculation:
1) The item is or has been exposed to conditions,
which are not within the tolerances allowed
under its normal operating conditions.
2) The item is or has been exposed to violence.
e) I̲t̲e̲m̲. An item is defined as system, module, unit
and part thereof.
f) M̲D̲S̲D̲. Meantime between Detection of Software
Defects.
g) M̲e̲a̲n̲ ̲T̲i̲m̲e̲ ̲B̲e̲t̲w̲e̲e̲n̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲(̲M̲T̲B̲F̲)̲. The statistical
mean of the functioning time between failures.
For a given interval, the total measured functioning
time of the item divided by the total number of
failures of that item during the interval. Agreed
scheduled preventive maintenance of modules of
the equipment shall not be counted when estimating
mean time between failure of such modules.
h) M̲e̲a̲n̲ ̲T̲i̲m̲e̲ ̲t̲o̲ ̲R̲e̲p̲a̲i̲r̲ ̲(̲M̲T̲T̲R̲)̲. The statistical mean
of the distribution of times-to-repair. The summation
of active repair times during a given period of
time divided by the total number of malfunctions
during the same time interval. This repair time
shall include all actions required to detect, locate
and repair the fault.
i) P̲r̲e̲v̲e̲n̲t̲i̲v̲e̲ ̲M̲a̲i̲n̲t̲e̲n̲a̲n̲c̲e̲. The maintenance undertaken
systematically with the intention of keeping an
item in a specified condition, reducing the occurrence
of failures, and prolonging the useful life of
the equipment.
j) R̲e̲l̲i̲a̲b̲i̲l̲i̲t̲y̲. The probability that an item will
perform a required function under stated conditions
for a stated period of time.
k) R̲e̲p̲a̲i̲r̲. A repair is the restoration of an item
to the state in which it can provide its specified
functions.
When the item is a replaceable module or includes
replaceable modules, the exchange operation is
considered as the repair operation.
l) M̲o̲d̲u̲l̲e̲. A collection of one or more units as defined
in this section which satisfy the following conditions:
1) It has a functional significance in the context
of R&M.
2) Individual failures can be localised to the
specific module.
3) The module is capable of removal and replacement.
4) The module operational condition has a simple
two state classification (operative or inoperative)
in the availability calculation.
m) U̲n̲i̲t̲ is a component contained in a module.
n) S̲y̲s̲t̲e̲m̲ ̲D̲o̲w̲n̲ ̲T̲i̲m̲e̲ (D…0f…s…0e…) is the down time of the dualized
system or module (like MTTR for non dualized).
o) S̲y̲s̲t̲e̲m̲ ̲U̲p̲t̲i̲m̲e̲ (U…0f…s…0e…) is the uptime of the dualized
system or module (like MTBF for non dualized).
2.1 A̲C̲R̲O̲N̲Y̲M̲S̲
CCA Crate Configuration Adaptor
CIA Crate Interface Adaptor
CPU Central Processor Module
DCA Disk Controller Adaptor
DISK Mass Storage
DISK CRTL Disk Controller/Formatter Module
FAN Ventilator
FLOP CRTL Floppy Disk Controller Module
LIA Line Interface Adaptor
LTU Line Termination Module
LTUX Line Termination Module for TDX Bus
MAP Mapping Module
MBT Main Bus Termination
MF&D Mains Filter and Power Distribution Panel
MIA MAP to I/O (in/out) Channel Adaptor
PS Power Supply
RAM Memory Module
SFA Single Floppy Disk Adaptor
SSC System Status and Control System
SS&C System Status and Control System
TDX Telecommunication Data Exchange
TDX CRTL TDX Controller Module
TDX IF TDX Interface Module
TIA TDX Interface Adaptor
V24 Standard V24 Terminal Interface
WCA Watchdog CPU Adaptor
WD Watchdog System
WPU Watchdog CPU
Notice that on some drawings, and in some CR-documents,
the SS&C System is called "Watch Dog System" (WD).
In this context the two names cover the same thing.
3̲ ̲ ̲P̲L̲A̲N̲N̲I̲N̲G̲ ̲S̲U̲M̲M̲A̲R̲Y̲
This Reliability and Maintainability Program Plan is
based on the CAMPS System Design Specification.
During the proposal and contract negotiation phases
a number of R&M models was prepared to justify that
the reliability, maintainability and availability requirements
of the system would be fulfilled.
The objective of the R&M Program is therefore to follow
up on these models during the design and development
phase and to complement these mathematical analyses
with a failure analysis and inspection of the actual
equipment.
As soon as the final MTBF value of a module is satisfactorily
completed, either analytically or by testing, it will,
accumulated with final MTBF values of the other modules,
provide baseline for a final availability figure for
the CAMPS System.
During the subsequent fabrication, the R&M program
shall if necessary perform the required R&M test in
accordance with the System Requirements Specification
para 3.4.4.2. Furthermore, the R&M program shall set
up a reporting system to be used in the warranty phases
for accumulation of failure reports and analysis of
the detected failures.
The R&M Program Tasks have been assigned to a number
of Work Packages described in section 3.1.
3.1 R̲&̲M̲ ̲P̲R̲O̲G̲R̲A̲M̲ ̲T̲A̲S̲K̲S̲
The Reliability and Maintainability Program Tasks are
performed in the following WPs:
WP 2.7.1 R&M Plan containing an outline of the planned
R&M Program. This WP is contained in section
3.
WP 2.7.2 R&M Analysis which contains an analysis
of the hardware equipment configuration.
The analysis will describe the equipment
partitioning and the applied redundancy
on which the derived R&M Models are based.
From the R&M Models the equipment availability
figures are calculated. This WP is contained
in section 4.
WP 2.7.3 RMA Reports. The RMA Reports will contain
the calculations on which the individual
module MTBF figures are based. These reports
will be submitted after establishment of
the hardware baseline. The contents of
the RMA Reports is described in section
5.
WP 2.7.4 Failure Reporting and Control. This WP
will contain all activities involved in
reporting and control of all detected hardware
failures. The planned procedures for Failure
Reporting and Control is described in Section
6.
WP 2.7.5 Failure Analysis. This WP will contain
the activities involved in the analysis
of detected hardware failures. Each failure
will be analysed in order to determine
the consequences for system performance.
The analysis methods are described in section
7.
WP 2.7.6 Monitoring of Software Defects. During
the software integration period and The
In-Plant Software Verification (ref. SRS
para 4.2.2.2), the occurance of Software
Defects will be monitored. The planned
procedures are described in section 8.
In the following is given an outline of the contents
of WP 2.7.2-6.
3.1.1 R̲ ̲&̲ ̲M̲ ̲A̲n̲a̲l̲y̲s̲i̲s̲
As part of the R&M Analysis will be specified the mathematical
methods and techniques to be utilized in demonstrating
that the availability requirements stated in the SRS
section 3.4.4.4 are met.
The R&M analysis of the equipment including environmental
control and Power Supply equipment will be an integral
part of the program. The R&M analysis will include
the following:
a) Develop a complete and thorough definition of "failure"
as applied to the equipment. Define categories
of failures that apply to the equipment, such as
catastrophic, critical and non-critical.
b) Identify equipment functions and assign R&M values
to appropriate equipment elements.
c) Proof of failure independence of hardware units
and modules, including the assumptions involved.
d) Determination of failure detection and localization.
e) Proof of the capability for on-line removal and
replacement of modules (or units) during system
operation.
f) Identify and analyse redundancies applied.
g) Identify the stress parameters under which equipment
elements must operate.
h) Analyse and apply operational and maintenance concepts.
SDS 4.3.
i) Determination of MTBF values for each module.
j) Determination of MTTR values for each module.
k) Predict overall R&M values for the equipment including
environmental control and power supply equipment.
l) Compare predicted values with specified requirements.
m) Identify problem areas and recommend corrective
action.
n) Assure that any proposed design changes are analysed
and the effects on equipment reliability and maintainability
are properly reported.
o) Update R&M predictions using test data, where appropriate.
The Analysis shall include an R&M system model which
partitions the equipment into HW modules and units.
The R&M model shall include block diagrams for each
of the 6 major reliability requirements specified in
the SRS section 3.4.4.4. These are:
1) Service to individual user connecting points
2) Service common to 75% of the user connecting points
3) Service to supervisor and message service terminals
4) Service to individual external channels
5) Service common to groups of external channels
6) Service to all circuits, channels and user connecting
Points.
These block-diagrams shall include all the HW units
and modules affecting the particular performance parameter.
For each HW unit and module and the related MTBF and
MTTR values shall be specified.
The initial values for each hardware unit and module
shall be justified from actual test data on "like"
or similar equipment, or from a component analysis,
utilizing values taken from recognized sources, e.g.
reliability and maintainability handbooks maintained
by National Governments and by Industry.
The analysis of the R&M model shall include an evaluation
of the following:
a) Identification of equipment items and quantities,
constituting a system.
b) Identification of redundant hardware units
and modules.
c) The completeness of the required partitioning,
including proper accounting of all interconnecting
cabling, switching etc.
The model, analysis, and calculations shall be refined
and changed, as necessary, as system design progresses.
3.1.2 R̲M̲A̲ ̲R̲e̲p̲o̲r̲t̲s̲
The RMA Reports will contain the calculation on which
the individual module MTBF figures are based. The reliability
data for each of the components of the module will
be included in the reports, and the module, MTBF calculations
will be based on the methods described in MIL-HDBK-217C.
3.1.3 F̲a̲i̲l̲u̲r̲e̲ ̲R̲e̲p̲o̲r̲t̲i̲n̲g̲ ̲a̲n̲d̲ ̲C̲o̲n̲t̲r̲o̲l̲
CR shall develop and utilize a system for collecting
analyzing and reporting all failure and maintenance
data that is generated during the test and warranty
phases of the CAMPS contract.
The analysis and reporting of a failure and subsequent
maintenance actions shall differentiate between, but
not be restricted to, those failures that are due to
either equipment failure or human error.
CR's failure reporting system shall include provisions
to assure that effective corrective actions are taken
on a timely basis to reduce or prevent repetition of
the failures.
Any operating discrepancy that requires an unscheduled
adjustment or calibration to be made (except normal
operating adjustment or scheduled maintenance procedure),
after initial satisfactory operation of the affected
equipment, shall be reported.
3.1.4 F̲a̲i̲l̲u̲r̲e̲ ̲A̲n̲a̲l̲y̲s̲i̲s̲
CR shall perform an analysis of each actual failure
and all known potential failures to determine the effect
on the system performance and to recommend remedial
actions. Each failure and each identified potential
failure shall be ranked according to criticality, i.e.
its effect on overall performance. All failures identified
as critical failures shall receive further investigation
to identify the exact modes and causes of failure.
3.1.5 M̲o̲n̲i̲t̲o̲r̲i̲n̲g̲ ̲o̲f̲ ̲S̲o̲f̲t̲w̲a̲r̲e̲ ̲D̲e̲f̲e̲c̲t̲s̲
During the software integration period and the In-Plant
Software Verification (ref. SRS para 4.2.2.2) the occurrence
of software defects will be monitored.
From the observed Software Defect occurrences the Meantime
between Detection of Software Defects (MDSD) can be
calculated.
3.2 O̲R̲G̲A̲N̲I̲Z̲A̲T̲I̲O̲N̲
The RMA activities are the responsibility of the Systems
Engineering group within the CAMPS organization.
3.3 S̲C̲H̲E̲D̲U̲L̲E̲S̲
The schedule for the R&M activities is shown overleaf.
FIGURE (SCHEDULE)
4̲ ̲ ̲C̲A̲M̲P̲S̲ ̲R̲&̲M̲ ̲A̲N̲A̲L̲Y̲S̲I̲S̲
4.1 G̲E̲N̲E̲R̲A̲L̲
This section contains an R&M analysis of the CAMPS
hardware equipment. Several of the items to be addressed
in this section are contained in the SDS and will thus
be treated by proper reference.
4.1.1 F̲a̲i̲l̲u̲r̲e̲ ̲D̲e̲f̲i̲n̲i̲t̲i̲o̲n̲s̲
The following definitions shall apply for categorization
of hardware failures:
a) C̲a̲t̲a̲s̲t̲r̲o̲p̲h̲i̲c̲ ̲F̲a̲i̲l̲u̲r̲e̲
The following types of failures are contained in
this category.
1) Simultaneous occurence of failures in dualized
equipment.
2) Single Processor Assembly failure when the
stand-by Proccessor Assembly is not available.
3) Drop out of external power supply.
b) C̲r̲i̲t̲i̲c̲a̲l̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲
Failures of this category cause a switch-over to
redundant stand-by equipment and are thus single
failures in redundant equipment.
c) N̲o̲n̲-̲C̲r̲i̲t̲i̲c̲a̲l̲ ̲F̲a̲i̲l̲u̲r̲e̲s̲
Failures of this category do not cause a switch-over
and are failures that only affect:
1) Up to maximum 8 user connecting points.
2) Max. 2 external channels.
4.1.2 E̲q̲u̲i̲p̲m̲e̲n̲t̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
CAMPS hardware equipment can be divided into 4 groups
each providing separate functions:
a) P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲, which provides the processing-function.
b) C̲h̲a̲n̲n̲e̲l̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲, which provides the data-base
function and the external channel interface.
c) T̲D̲X̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲, which provides the user interface.
d) W̲a̲t̲c̲h̲d̲o̲g̲ ̲P̲r̲o̲c̲e̲s̲s̲o̲r̲, which provides the system status
and control functions.
Later in this section, R&M models are elaborated for
the modules contained in each of the 4 equipment groups
and the corresponding MTBF values are calculated.
4.1.3 F̲a̲i̲l̲u̲r̲e̲ ̲I̲n̲d̲e̲p̲e̲n̲d̲a̲n̲c̲e̲
The CAMPS hardware equipment is partitioned into individual
modules. Individual modules are interconnected through
common bus structures. The bus structure provides all
module interconnections which consist of:
a) Power supply lines
b) Lines for exercise of module control
c) Lines for data exchange
Power supply modules are equipped with over-voltage
protection on the outlet. This measure is to avoid
that a power supply over-voltage failure causes damage
to all connected modules.
All other modules are equipped with fuses in the power
supply lead. This measure is to avoid that any failure
on an individual module that causes an excessive supply
current, will affect other modules.
Due to the described measures no failure in an individual
module except power drop out will influence the individual
functions of other modules.
Any individual module failure, which causes a clamping
of any of the control - or data lines will of course
give mal-function to that part of the equipment, which
is connected to the same bus. However, the failure
will be restored by replacement of the failed module.
4.1.4 F̲a̲i̲l̲u̲r̲e̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲L̲o̲c̲a̲l̲i̲z̲a̲t̲i̲o̲n̲
An extensive analysis of this subject is given in the
SDS section 4.11.
4.1.5 M̲o̲d̲u̲l̲e̲ ̲R̲e̲p̲l̲a̲c̲e̲m̲e̲n̲t̲
After failure localization normal system operation
is restored by replacement of the failed module.
Module replacement is accomplished by 4 different procedures
dependent on which type of assembly the module is mounted
in.
The different procedures apply to modules mounted in
the:
a) Processor Assembly
b) Channel Assembly
c) TDX Assembly
d) Watchdog Processor Assembly
From the described procedures it is seen that for users
not directly affected by the failed module, no departure
from normal system operation has been experienced.
Thus the chosen equipment design provides the capability
for on-line removal and replacement of modules during
system operation.
4.1.5.1 P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲
For replacement of modules mounted in the Processor
Assembly the following procedures applies:
a) The relevant Processor Assembly shall be taken
off-line (ref. SDS section 4.3).
b) The Processor Assembly power supply is switched
off.
c) The relevant module is replaced.
d) The power supply is switched on.
e) The Processor Assembly is brought back to the mode
of operation which was executed prior to failure
detection (ref. SDS section 4.3).
4.1.5.2 C̲h̲a̲n̲n̲e̲l̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲
Replacement of modules in the Channel Assembly is executed
without switch-off of the connected power supply.
4.1.5.3 T̲D̲X̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲
For replacement of modules mounted in the TDX Assembly
the following procedure applies:
a) Users connected to the TDX Assembly shall be requested
to terminate operation.
b) When all users connected to the TDX Assembly in
questions have terminated operation the TDX Assembly
power supply is switched off.
c) The relevant module is replaced.
d) The power supply is switched on
e) All users connected to the TDX Assembly in question
are notified to resume operation.
4.1.5.4 W̲a̲t̲c̲h̲d̲o̲g̲ ̲P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲A̲s̲s̲e̲m̲b̲l̲y̲
For replacement of the Watchdog Processor module the
following procedure applies:
a) The Watchdog Assembly power supply is switched
off.
b) The Watchdog Processor module is replaced.
c) The power supply is switched on.
Replacement of the Watchdog Processor does not affect
the operation of the remaining CAMPS equipment.
The Watchdog Processor functions are only required
for switch over between redundant equipment items,
for equipment reconfiguration and for execution of
maintenance and diagnostics tasks.
4.1.6 A̲p̲p̲l̲i̲e̲d̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
The principles of redundancy applied in the CAMPS hardware
design are shown in figure 4.1.6-1. From the figure
it is seen that the following equipment items are dualized:
a) Processor Assembly
b) I/O Bus
c) TDX Bus
Under normal mode of operation one part of the dualized
equipment is characterized as: A̲c̲t̲i̲v̲e̲ while the other
part is characterized as: S̲t̲a̲n̲d̲-̲B̲y̲.
Control of the active as well as the stand-by equipment
configuration is exercised by the Watchdog equipment.
Switch-over from the active mode to stand-by mode of
the Processor Assembly will take place after detection
of a failure in:
a) The active Processor Assembly
b) The active I/O Bus
A failure in the active TDX Bus will result in a switch
over to the standby TDX Bus.
The equipment (LTUXs) interconnecting the individual
users to the TDX-Bus is not dualized. However, the
equipment partitioning is so designed that a maximum
of 8 connected users will be affected by a failure
in this part of the equipment. The equipment (LTUs)
interconnecting external channels to the Channel Bus
is not dualized either. A maximum of 2 external channels
is connected to each LTU which means that a single
LTU-failure can affect at most 2 external channels.
For a detailed analysis of System Supervision please
refer to the SDS section 4.3.
Figure 4.2.2-9 + figure 4.1.6-1
4.1.7 S̲t̲r̲e̲s̲s̲ ̲P̲a̲r̲a̲m̲e̲t̲e̲r̲s̲
A detailed description of the power input tolerances,
environmental conditions and electromagnetic compatibility
under which the equipment shall be able to operate
is given in the SRS sections 3.4.2.2, 3.4.3, and 3.5.2.10.
4.1.8 C̲A̲M̲P̲S̲ ̲M̲o̲d̲e̲s̲ ̲o̲f̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲
An extensive analysis of CAMPS operational - and maintenance
modes of operation is given in the SDS section 4.3.
4.1.9 E̲q̲u̲i̲p̲m̲e̲n̲t̲ ̲M̲e̲a̲n̲t̲i̲m̲e̲ ̲B̲e̲t̲w̲e̲e̲n̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲(̲M̲T̲B̲F̲)̲
This section provides the initial values of MTBF for
the CAMPS hardware modules.
The predicted MTBF values are derived from analysis
of the module layout and the MTBFs of the individual
components used.
Most of the integrated circuits components are tested
to the requirements of MIL-STD-833 level B.
Table 4.1 overleaf gives the values of the MTBF and
the reciprocal value lambda representing the failure
rate per million hours for each of the individual CAMPS
modules.
Racks and crates have a failure rate which is very
small ( 0.001 fmph) and these items are not incorporated
in the models.
For modules with an MTBF above 3500 hours the MTBF
values will be justified by analytical calculation
during the contract phase as part of the R&M program.
For modules with a predicted MTBF equal to or below
3500 hours the lower confidence will be verified, as
part of the R&M program.
- 70% lower confidence limit value at the time of
Factory Acceptance Verification (site 1)
- 80% lower confidence limits will be established
at time of SPA of the first installation.
The lower confidence limit values of MTBF shall be
determined from the test data using the following equation:
2T
MTBF ̲ ̲ ̲ ̲ ̲ ̲M̲ ̲ ̲ ̲ ̲ ̲
LCL 2
,f
where
T…0f…M…0e… = the total cumulative operating time
obtained from the test program
2 …0f…d,f…0e… = chi-square value with probability
and f degrees of freedom
(̲1̲0̲0̲-̲L̲C̲L̲)̲
100
LCL = Specified Lower Confidence
level, expressed as a percentage
f = 2x(N + 1), where N is the number
of test measurements (sample
size)
N…0f…p…0e… = the actual number of valid module
or unit failures observed during
the period T…0f…M…0e….
DESCRIPTION LAMBDA MTBF
(FPM) (hours)
̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲
̲ ̲ ̲ ̲ ̲ ̲
SS AND C WATCHDOG SUB-SYSTEM 110 9091
CR8003D/040PC/00 CPU + CACHE 75 13333
CR8016D/032PC/00 RAM 32K 65 15384
CR8016D/064PC/00 RAM 64K 110 9091
CR8016D/128PC/00 RAM 128K 200 5000
CR8020D/000PC/00 MAP+IO 60 16667
CR8071D/010--/00 MIA 25 40000
CR8021D/040PC/00 TDX IF 75 13333
CR8073D/010--/00 TIA 25 40000
CR8081D/010A-/00 CIA A 20 50000
CR8081D/010-B/00 CIA B 20 50000
CR8044D/040AB/00 DISK CTRL 75 13333
CR8084D/010--/00 DCA 20 50000
CR8047D/010A-/00 ST FLOP CRTL 45 22222
CR8087D/010--/00 SFA 20 50000
CR8050D/010PC/00 POWER SUPPLY 20 50000
CR8050D/020AB/00 POWER SUPPLY 20 50000
CR8055D/010PC/00 MBT 5 200000
CR1070S/000--/00 TDX CTRL 50 20000
CR8066D/010AB/00 LTU, 1xHDLC 55 18182
CR1060S/000--/00 LTUXS 30 33333
CR8082D/010--/00 LIA, N 3 333333
CR8105D/020--/00 VENTILATOR 20 50000
CR8105S/010--/00 FAN LTUX-SSC 20 50000
CR8022S/000--/00 POWER SUPPLY 19 52632
CR1082S/000--/00 BTM-X TERM 2 5000000
CR2510-/000--/00 TDX OUTLET 2 5000000
OPTICAL MUX/DEMUX (4 CHANNELS) 15 66667
CR8106D/220--/00 MAINS FILTER 5 200000
CR8170D/010--/00 POW DIST PAN 0
OPTICAL MUX/DEMUX (1 CHANNEL) 4 250000
TABLE 4.1…01…MODULE MTBFs
4.1.10 E̲q̲u̲i̲p̲m̲e̲n̲t̲ ̲M̲a̲i̲n̲t̲a̲i̲n̲a̲b̲i̲l̲i̲t̲y̲ ̲(̲M̲T̲T̲R̲)̲
Software and firmware will pin-point faulty modules
and replacement of modular units results in low repair
time for CAMPS equipment.
In terminal and peripheral devices the fault detection
and isolation is accomplished by a combination of built-in
tests, software, and operator observations.
The off-line diagnostic software package contains a
set of hardware test programs which provide fault detection
down to module level within the CAMPS equipment. The
command interpreter module of the diagnostic package
enables the operator to initiate any or all of the
test programs for the specified system or module off
line to assist in trouble shooting and to verify the
repair. The diagnostic package shall also assist in
fault isolation of the terminals and peripherals.
The Mean-Time-To-Repair (MTTR) for the CAMPS equipment
will be derived from two sources. First is actual experience
data, on the equipment proposed for the CAMPS system.
The other source is from predictions generated in accordance
with MIL-HDBK-472 or similar documents. As an example,
the MTTR for the Disc Storage Unit is calculated from
repair times measured by the supplier. The repair times
of other equipment is derived by an analysis of the
tasks associated with fault detection, isolation, repair,
and verification.
The predicted MTTR values are from experience with
modules of a similar type from the NICS-TARE and FIKS
programs. For those modules newly developed for CAMPS,
the MTTR is estimated based on experience with similar
equipment. The predicted MTTR assumes that all tools,
repair parts, manpower, etc. required for the maintenance
are continously available.
For all the R&M models elaborated in para 4.2 an MTTR
= l hour is used. This value is considered to be a
conservative prediction, and in practice a lower value
will apply for most of the modules. However, by choosing
MTTR = 1 hour for all modules, the calculated availability
figures represent worst cases.
In accordance with the System Requirements Specification
section 4.2.1.7.2 MTTR values equal to or less than
45 min. will be verified by test as part of the R&M
program. The test shall be conducted in a way that
provides an upper confidence limit value in accordance
with the following schedule:
1) 70% upper confidence limit value at the time of
the Factory Acceptance Test of Site 1.
2) 80% upper confidence limit value at the time of
SPA at site 1.
The upper confidence limit values will be established
from the test data by the following equation.
̲
S̲
MTTR = X + t x
UCL ,f N
where
MTTR…0f…UCL…0e… = upper confidence limit MTTR
̲
X = the observed statistical mean of the
measured maintenance action periods:
̲
X X̲i̲
N
i
where X…0f…i…0e… are the individual observations .
t …0f…,f…0e… = students t-value with probability
and f degrees of freedom.
= (̲1̲0̲0̲-̲U̲C̲L̲)̲
100
UCL = Specified Upper confidence Limit expresed
as a percentage.
f = (N-1), where N is the number of test
measurements (sample size)
S = Observed standard deviation of the
measures maintenance action periods:
̲
S…0e…2…0f… = (X…0f…i…0e… - X)…0e…2…0f…/(N - 1)
i
4.1.11 P̲r̲e̲d̲i̲c̲t̲e̲d̲ ̲A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲
The reliability model block diagrams are structured
such that the availability for the individual and group
connecting points for users and channels can easily
be calculated.
Eleven models are developed:
model 1 Central Processor assy.
model 2 Channel Unit Assy. (mass storage included)
model 3A TDX bus system
model 3B LTUX Chain
model 4A individual user connecting points
model 4B individual channel connecting points
model 5A group of user connecting points
model 5B group of channel connecting points
model 6 75% or more user connecting points
model 7 Service to all circuits, channels, and
user connecting points.
model 8 SS&C System
The reliability models used for performing the R&M
analysis are included in section 4.2.
The availability calculation associated with each model
is presented in tabular form using MTBF/lambda and
MTTR values.
T̲h̲e̲ ̲F̲l̲o̲p̲p̲y̲ ̲D̲i̲s̲k̲ ̲D̲r̲i̲v̲e̲ and t̲h̲e̲ ̲F̲l̲o̲p̲p̲y̲ ̲D̲i̲s̲k̲ ̲C̲o̲n̲t̲r̲o̲l̲l̲e̲r̲
have not been incorporated in the calculations, because
this equipment does not directly have influence on
system availability.
Anyway the failure rate for the controller is stated
in sec. 4.1.9 (CR 8047D).
T̲h̲e̲ ̲F̲r̲e̲q̲u̲e̲n̲c̲y̲ ̲S̲t̲a̲b̲i̲l̲i̲z̲e̲r̲s̲ have come into consideration
recently and are not yet introduced into the models.
Information by the supplier says that MTBF = 10,000
hours.
In dualized mode we then have:
U…0f…s…0e… = 50 x 10…0e…6…0f… hours with a MTTR = 1 hour
The system status and controller, SSC, is not essential
to continuous operation (refer to section 4.1.11.1).
Examination of the block diagram shows that the model
provides sufficient equipment success paths to satisfy
the operational requirements.
The equipment quantities and redundancies used in the
R&M models represent the worst case values for each
availability case.
The availability figures for the required cases are
derived through systematic calculations using the appropriate
model. Each item, or group of items, associated with
the model is tabulated along with the MTBF, MTTR, and
quantities needed for successful operation of identifying
M of N equipment items as required for a success rate.
Where redundancy is applied the uptime U…0f…s…0e… and downtime
D…0f…s…0e… of the connected redundant system (group of redundant
items) is calculated. The formulae for 1 of 2 redundant
items with on-line repair is:
2
U = U̲ ̲ ̲ ̲+̲ ̲2̲U̲D̲ ̲ D = 2Ds
s
2D
where U is the item MTBF and D is the item MTTR. The
formulae for 2 of 3 redundant items with on-line repair
is:
2
U = U̲ ̲ ̲ ̲+̲ ̲3̲U̲D̲ D = 2Ds
s
3D
Higher levels of redundancy invariably result in negligible
failure rate, when repair is allowed. These formulae
are from Einhorn and Plotkin, "Reliability Prediction
for Degradable and Non-Degradable Systems", United
States Air Force Report ESD-TDR-63-642, November 1963.
The availability for each element is then obtained
as:
̲ ̲ ̲ ̲ ̲M̲T̲B̲F̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲U̲ ̲ ̲ ̲
…0e…A =…0f… MTBF + MTTR …0e… =…0f… U + D
4.1.11.1 S̲y̲s̲t̲e̲m̲ ̲S̲t̲a̲t̲u̲s̲ ̲a̲n̲d̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲R̲e̲l̲i̲a̲b̲i̲l̲i̲t̲y̲ ̲M̲o̲d̲e̲l̲
The RMA values for the SS&C system are summarized below:
Lampda = 110 fpmh
MTBF = 9090 hours
MTTR = 1 hour
A = 0,99990
Lampda for the operator VDU = 200 fpmh
Lampda for the operator printer = 333 fpmh
The failure rate for the SS&C system, when appearing
in the system, will just be the entire failure rate
of the SS&C system. This is due to the fact that the
SS&C function is only called upon in a failure situation
to:
a) reconfigure the system by
1) automatic switch-over to dualized equipment
(if existing)
2) isolation of faulty device
3) connection of faulty device to off-line PU
b) execute M&D programs to localize the failure
A system failure exists, if the SS&C system is unavailable,
when required to perform one of the above functions.
Per failure, the SSC is needed during the repair time
= 1 hour.
The probability that the SS&C module is non-operational
during a one-hour period concurrent with a failure
is:
643/10…0e…6…0f… = 0.000643
As lampda for the SS&C module, operator VDU, and printer
is: 110 + 200 + 333 = 643 fpmh.
The serial equivalent failure rate for a CAMPS system
equipment is:
Central Processors 2 x 851 = 1702 fpmh
Channel unit (incl. 2 x 418 = 936
mass storage)
TDX Bus 2 x 78 = 156
LTUX Chains 10 x 77 = 770
LTU Chains 8 x 60 = 480
VDUs 24 x 200 = 4800
ROPs 8 x 333 = 2664
Low speed teleprinter 24 x 200 = 4800
̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲
̲
Total failure rate = 16308 fpmh
̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲
̲
The figures are taken from the models in section 4.2.2.
A serial equivalent failure rate for the SS&C system
is obtained by multiplying the probability of occurrence
0.000643 by the total failure rate of 16308 fpmh. The
resulting rate is 10.5 fpmh.
A conservative failure rate of 15 fpmh is applied in
the models.
4.2 R̲&̲M̲ ̲M̲O̲D̲E̲L̲I̲N̲G̲
A number of R&M models have been elaborated for each
of the requirements stated below.
4.2.1 A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲ ̲a̲n̲d̲ ̲R̲e̲l̲i̲a̲b̲i̲l̲i̲t̲y̲ ̲R̲e̲q̲u̲i̲r̲e̲m̲e̲n̲t̲s̲
The availability and reliability requirements are stated
in form of service to user channel connecting points.
The demarcation point between terminals and the equipment
is the user connecting point. The availability of service
shall be measured at the user connecting point of the
equipment, to which each terminal is attached.
4.2.1.1 I̲n̲d̲i̲v̲i̲d̲u̲a̲l̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲:̲
a) The availability of the subset of the equipment
which provides service to each user connecting
point shall be at least .9995
b) The MTBF of a failure which causes loss of service
to a single user connecting point shall be at least
3 months with an MTTR not to exceed 40 mins.
Refer to model 4A.
4.2.1.2 G̲r̲o̲u̲p̲s̲ ̲o̲f̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲
It shall be possible to select user connecting points
in groups in such a way that no single failure shall
cause loss of service to more than one such group.
Within the expansion capacity specified in this document,
the maximum number of user connecting points in such
a group is 8.
The equipment shall be designed such that no single
failure can cause loss of service to 25% or more user
connecting points.
a) The availability of the subset of the equipment
at the maximum expanded configuration which provides
service to 75% or more of the user connecting points
shall be at least 0.9999.
b) The MTBF of a failure which causes loss of service
to 25% or more user connecting points of the maximum
expanded configuration shall be at least 1 year
with an MTTR not to exceed 1 hour.
Refer to model 5A.
4.2.1.3 U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲ ̲t̲o̲ ̲S̲u̲p̲e̲r̲v̲i̲s̲o̲r̲y̲ ̲a̲n̲d̲ ̲S̲e̲r̲v̲i̲c̲e̲
̲T̲e̲r̲m̲i̲n̲a̲l̲s̲
It shall be possible to divide the connecting points
providing service to the terminals of the supervisory
and service positions into more than one group
a) The availability of the subset of the equipment
which provides service to any such group shall
be at least 0.9999.
b) The MTBF of a failure which causes loss of service
to such a group shall be at least 1 year with an
MTTR not to exceed 1 hour.
Refer to model 5A.
4.2.1.4 E̲x̲t̲e̲r̲n̲a̲l̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲ ̲a̲n̲d̲ ̲C̲i̲r̲c̲u̲i̲t̲s̲
Availability of service to external channels circuits
shall be measured at the connection point of the equipment,
to which each circuit is attached.
4.2.1.5 I̲n̲d̲i̲v̲i̲d̲u̲a̲l̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲
1) The availability of the subset of the equipment
which provides service to each incoming or outgoing
channel shall be at least .9995.
2) The MTBF of a failure which causes loss of service
to a single incoming or outgoing channel shall
be at least 3 months with an MTTR not to exceed
40 mins.
Refer to model 4B.
4.2.1.6 G̲r̲o̲u̲p̲s̲ ̲o̲f̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲
It shall be possible to divide outgoing channels and
incoming channels each into at least two groups such
that no single failure shall cause loss of service
to more than one such group.
a) The availability of the subset of the equipment
which provides service to any such group of external
connections of the maximum configuration shall
be at least 0.9999. The requirement shall be met
separately for incoming and outgoing channels.
b) The MTBF of a failure which causes loss of service
to any such group of external connections shall
be at least 1 year with an MTTR not to exceed 1
hour.
Refer to model 5B.
4.2.1.7 A̲l̲l̲ ̲C̲i̲r̲c̲u̲i̲t̲s̲,̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲ ̲a̲n̲d̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲
No single failure shall cause a total system failure.
a) The availability of the subset of the equipment
which provides service to all user connecting
points and external circuits shall be at least
0.99995.
b) The MTBF of a failure which causes loss of service
to all external circuits and user connecting points
shall be at least 2 years with an MTTR not to exceed
1 hour.
Refer to model 7.
4.2.2 R̲&̲M̲ ̲M̲o̲d̲e̲l̲s̲
This section contains the specification of the models
and a calculation of the availability for each of the
models. The total system model is shown in figure 4.2.-1.
The individual models and calculations are:
Figure 4.2-1
4.2.2.1 M̲o̲d̲e̲l̲ ̲1̲:̲ ̲C̲e̲n̲t̲r̲a̲l̲ ̲P̲r̲o̲c̲e̲s̲s̲o̲r̲ ̲A̲s̲s̲y̲
Figure 4.2.2-1 HW Configuration
Figure 4.2.2-2 Reliability Model
Table 4.2.2-1 Availability Calculations
4.2.2.2 M̲o̲d̲e̲l̲ ̲2̲:̲ ̲C̲h̲a̲n̲n̲e̲l̲ ̲U̲n̲i̲t̲ ̲A̲s̲s̲y̲ ̲a̲n̲d̲ ̲M̲a̲s̲s̲ ̲S̲t̲o̲r̲a̲g̲e̲ ̲
Figure 4.2.2-3 HW Configuration
Figure 4.2.2-4 Reliability Model
Table 4.2.2-2 Availability Calculations
4.2.2.3 M̲o̲d̲e̲l̲ ̲3̲A̲:̲ ̲T̲D̲X̲ ̲B̲u̲s̲ ̲S̲y̲s̲t̲e̲m̲
Figure 4.2.2-5 HW Configuration
Figure 4.2.2-6 Reliability Model
Tabel 4.2.2-3 Availability Calculations
4.2.2.4 M̲o̲d̲e̲l̲ ̲3̲B̲:̲ ̲"̲L̲T̲U̲X̲-̲C̲h̲a̲i̲n̲"̲
Figure 4.2.2-7 HW Configuration
Figure 4.2.2-8 Reliability Model
Table 4.2.2-4 Availability Calculation
4.2.2.5 M̲o̲d̲e̲l̲ ̲4̲A̲:̲ ̲I̲n̲d̲i̲v̲i̲d̲u̲a̲l̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲
Figure 4.2.2-9 HW Configuration Figure
4.2.2-10
Reliability
Model
Table 4.2.2-5 Availability Calculations
4.2.2.6 M̲o̲d̲e̲l̲ ̲4̲B̲:̲ ̲I̲n̲d̲i̲v̲i̲d̲u̲a̲l̲ ̲E̲x̲t̲e̲r̲n̲a̲l̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲
Figure 4.2.2-11 HW Configuration Figure
4.2.2-12
Reliability
Model
Table 4.2.2-6 Availability Calculations
4.2.2.7 M̲o̲d̲e̲l̲ ̲5̲A̲:̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲C̲o̲m̲m̲o̲n̲ ̲t̲o̲ ̲G̲r̲o̲u̲p̲s̲ ̲o̲f̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲
̲P̲o̲i̲n̲t̲s̲
Figure 4.2.2-13 HW Configuration
Figure 4.2.2-14 Reliability Model
Table 4.2.2-7 Availability Calculations
4.2.2.8 M̲o̲d̲e̲l̲ ̲5̲B̲:̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲C̲o̲m̲m̲o̲n̲ ̲t̲o̲ ̲G̲r̲o̲u̲p̲s̲ ̲o̲f̲ ̲E̲x̲t̲e̲r̲n̲a̲l̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲
Figure 4.2.2-15 HW Configuration
Figure 4.2.2-16 Reliability Model
Table 4.2.2-8 Availability Calculation
4.2.2.9 M̲o̲d̲e̲l̲ ̲6̲A̲:̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲C̲o̲m̲m̲o̲n̲ ̲t̲o̲ ̲7̲5̲%̲ ̲o̲f̲ ̲U̲s̲e̲r̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲
̲P̲o̲i̲n̲t̲s̲
Figure 4.2.2-17 HW Configuration
Figure 4.2.2-18 Reliability Model
Table 4.2.2-9 Availability Calculation
4.2.2.10 M̲o̲d̲e̲l̲ ̲7̲:̲ ̲S̲e̲r̲v̲i̲c̲e̲ ̲t̲o̲ ̲A̲l̲l̲ ̲C̲i̲r̲c̲u̲i̲t̲s̲,̲ ̲C̲h̲a̲n̲n̲e̲l̲s̲,̲ ̲a̲n̲d̲ ̲U̲s̲e̲r̲
C̲o̲n̲n̲e̲c̲t̲i̲n̲g̲ ̲P̲o̲i̲n̲t̲s̲
Figure 4.2.2-19 HW Configuration
Figure 4.2.2-20 Reliability Model
Table 4.2.2-10 Availability Calculation
4.2.2.11 M̲o̲d̲e̲l̲ ̲8̲:̲ ̲S̲S̲&̲C̲ ̲S̲y̲s̲t̲e̲m̲
Figure 4.2.2-21 Reliability Model
4.2.2.12 G̲e̲n̲e̲r̲a̲l̲ ̲C̲o̲m̲m̲e̲n̲t̲s̲ ̲t̲o̲ ̲T̲a̲b̲l̲e̲s̲ ̲4̲.̲2̲.̲2̲-̲1̲ ̲t̲o̲ ̲4̲.̲2̲.̲2̲-̲1̲2̲:̲
The contents of the headlines of the tables is described
in section 4.l.5 in this plan. A brief explanation
shall be stated here:
M of N Reg'd Means the required number M out
of N items to implement a function.
MTBF (hours) each Meantime between failures in hours
of the single module.
(fpm)each Lambda designates the failure rate
per million hours of the individual
module.
MTTR Meantime to repair a single failure
in hours.
equiv (fpm) The equivalent failure rate per
million hours of the dualized system
or module.
A equiv x 10…0e…-6…0f… is the unavailability period of
time
(M of N) of M required out of N dualized
systems or modules (not used in
the present plan).
U…0f…s…0e… M of N is the up time of the dualized
(hours) systems or modules in hours, like
MTBF for non-dualized.
D…0f…s…0e… M of N is the down time of the dualized
(hours) system or module, like MTTR for
non-dualized.
U…0f…s…0e…
AVAILABILITY(A) A = ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲
U…0f…s…0e…+D…0f…s…0e…
Figure 4.2.2-1
Fig. 4.2.2-2
Model 1.
Refer to table 4.2.2-1
Calculation of MTBF for dualized PU assy.
Using the formula in section 4.1.11:
2
(̲M̲T̲B̲F̲)̲ ̲ ̲+̲ ̲2̲ ̲x̲ ̲M̲T̲B̲F̲ ̲x̲ ̲M̲T̲T̲R̲
MTBF = hours
2 x MTTR
where MTTR = 1 hour (pessimistically estimated)
hence
2
(̲1̲1̲7̲5̲)̲ ̲ ̲+̲ ̲2̲ ̲x̲ ̲1̲1̲7̲5̲ ̲x̲ ̲1̲
MTBF = = 691,159 hours 2 x 1
̲1̲ ̲ ̲
(failure rate) = = 1.45 x 10…0e…-6…0f… h…0e…-1…0f…
MTBF
Calculation of availability for dualized PU assy:
Using the formula in section 4.1.11:
̲6̲9̲1̲,̲1̲5̲9̲ ̲ ̲ ̲ ̲
A = = 0.9999886
691,159 + 1
Table 4.2.2-1
Figure 4.2.2-3
Figure 4.2.2-4
Comments to model 2.
As seen in fig. 4.2.2-4, only FAN and cables are not
dualized.
The CIA, MBT, PS, and MF&D are all fully dualized.
MF&D means Mains Filter and Distribution. As an example
the MTBF and the availability for a dualized CIA module
is calculated here.
Using the formula in section 4.l.5 gives:
2
(̲5̲0̲,̲0̲0̲0̲)̲ ̲ ̲+̲ ̲2̲ ̲x̲ ̲5̲0̲,̲0̲0̲0̲ ̲x̲ ̲1̲ 9
MTBF (CIA): = 1.25 x 10 hours
d 2 x 1
-12 -1
and equiv. (CIA) = 800 x 10 h
Obviously for the other items involved here, better
equiv. is obtained and in the calculations the values
are set to zero.
The availability of the dual CIA is:
A = 0.999 999 9999.
Table 4.2.2-2 Model 2…01…IO-SYSTEM AVAILABILITY CALCULATION
Figure 4.2.2-5
Figure 4.2.2-6
Comments to model 3A.
Refer to figure 4.2.2-5 and table 4.2.2-3.
The TDX Bus System is fully dualized and only the availability
of this configuration is significant.
The formula in section 4.1.5 for 1 of 2 redundant items
is used to calculate U…0f…s…0e… ( MTBF) and at last this value
is used to calculate the availability with:
̲ ̲M̲T̲B̲F̲ ̲ ̲ ̲ ̲ ̲
A =
MTBF + MTTR
Table 4.2.2-3
Fig. 4.2.2-7
Fig. 4.2.2-8…01…
Comments to model 3B.
The availability of the "LTUX-chain" is calculated
straight forward using the availability formula.
Notice that V24 interfaces are contained in the LTUX.
Table 4.2.2-4 Model 3B
Figure 4.2.2.-9…01…Model 4A
Figure 4.2.2-10 Model 4A
Comments to model 4A:
Refer to fig. 4.2.2-9 and fig. 4.2.2-10.
This model shows the "reliability path" seen from an
individual user. Passing through the line terminal
unit and TDX system, the user gets in touch with processor
system and memory.
The major parts of the system is dualized and due to
this ability the U…0f…s…0e… (system up time) is used when availability
is calculated.
Requirements:
A = 0.9995
MTBF = 3 months
MTTR = 40 minutes
The availability is calculated as:
̲ ̲ ̲U̲s̲ ̲ ̲ ̲ ̲ ̲8̲5̲0̲4̲ ̲ ̲ ̲ …0f…0.999882…0e…
A U…0f…s…0e…+ D…0f…s…0e… 8504+1
where D…0f…s…0e… (MTTR) is pessimistically estimated to 1 hour.
Table 4.2.2-5 Model 4A
Figure 4.2.2-11 Model 4B
Figure 4.2.2-12 model 4B
Comments to model 4B:
Refer to fig. 4.2.2-11 and fig. 4.2.2-12.
This model shows the "reliability path" seen from an
individual channel connecting point (c.c.p).
The route passes a single c.c.p, i.e V24 interface
and via adaptor and line unit acces is created to processor
and mass storage
Because of duality of the main parts of the system,
it is realistic to consider U…0f…s…0e… (up-time of the system)
when availability is calculated.
Requirements:
A = 0.9995
MTBF = 3 months
MTTR = 40 minutes
The availability is calculated as:
̲ ̲ ̲U̲s̲ ̲ ̲ ̲1̲0̲0̲4̲2̲ ̲ ̲ ̲
A U…0f…s…0e…+ D…0f…s…0e… 10042+1 0.999900
where D…0f…s…0e… (MTTR) is pessimistically estimated to 1 hour.
Table 4.2.2-6 Model 4B
Figure 4.2.2-13 Model 5A
SERVICE COMMON TO GROUPS OF USER CONNECTING POINTS…01…4.2.2-14 Model 5
A
Comments to model 5A.
This model is used to calculate the availability for
those parts of the system which back up the groups
of individual user connection points i.e. reliability
paths via LTUXs into the system.
The individual user connecting points are subdivided
into group of eight and each group has its own BTM-X
module and power supply. All equipment which gives
service to such a group is dualized and therefore a
single failure cannot cause loss of service to more
than one group.
The supervisor and supervisor assistant are treated
like other users except that they must not be connected
to the same group.
Refer to figures 4.2.2-13, 14.
Because most of the elements are dualized, lambda equivalent
and U…0f…s…0e… (system up-time MTBF) is used.
Requirements:
A = 0.9999
MTBF = 1 year
MTTR = 1 hour
No single failure shall cause loss of service to more
than one group.
Table 4.2.2-7 Model 5A
Figure 4.2.2-15 model 5B
Figure 4.2.2-16 model 5B
Comments to model 5B.
This model is used to calculate the availability figure
for those parts of the system which back-up the groups
of external channel connection points. The reliability
path passes through LIA, LTU into the channel unit
and get access to PU and Mass Storage.
In CAMPS equipment the external channel connecting
points are sub-divided in up to 8 groups (LTUs) with
max. 2 lines connected to each group. The groups are
serviced by a dualized channel bus and ditto power
supply. It is therefore possible to divide incoming
channels and outgoing channels in at least two groups
and no single failure may cause loss of service to
more than one group.
Refer to figures 4.2.2-15, 16.
Most of the elements in this model are dualized and
therefore lambda equiv. and U…0f…s…0e… are calculated to estimate
the availability.
U
̲s̲ ̲ ̲ ̲ ̲ ̲
A =
U + D
s s
U…0f…s = system uptime (MTBF)
D…0f…s = system downtime (MTTR)
Requirements:
A = 0.9999
MTBF = 1 year
MTTR = 1 hour
No single failure shall cause loss of service to more
than one group.
Table 4.2.2-8 model 5B
Figure 4.2.2-17 model 6A
Figure 4.2.2-18 model 6A
Comments to model 6A.
The present model is used to calculate the availability
of those parts of the CAMPS system which support service
to 75% of the user connecting points in maximum configuration.
Refer to figure 4.2.2-17, 18.
Requirements:
A = 0.9999
MTBF = 1 year
MTTR = 1 hour
Table 4.2.2-9 model 6A
Figure 4.2.2-19 model 7
Figure 4.2.2-20 model 7
Comments to model 7.
Refer to figures 4.2.2-21,22.
This model is basis for the availability calculation
of those parts of the CAMPS system which gives service
common to all circuits, channels, and user connecting
points.
Refer to section 4.1.3.
All items including software, supervisor VDU and printer
etc. which are necessary to bring CAMPS in operative
mode are taken into account here.
Requirements:
A = 0.99995
MTBF = 2 years
MTTR = 1 hour
Table 4.2.2-10 model 7
Figure 4.2.2-21 model 8
4.3 D̲E̲S̲I̲G̲N̲ ̲C̲H̲A̲N̲G̲E̲S̲
Any design change of a baselined CAMPS configuration
item is initiated by issue of an Engineering Change
Proposal (ECP). The ECP will among others contain a
description of possible consequences regarding R&M
aspects.
Any ECP will be reviewed by CR System Engineering,
and the described R&M aspects evaluated. If the described
consequences give rise to any change of the calculated
availability figures in section 4.2, this will be reported
to SHAPE by issue of a R&M Report.
4.4 R̲&̲M̲ ̲T̲E̲S̲T̲I̲N̲G̲
Test Specification of MTBF values are according to
the SRS para 4.2.l.7, only required for modules with
predicted MTBF values 3500 hours.
Test verificaton of MTTR values are also according
to the SRS para 4.2.1.7, only required for modules
with predicted MTTR values 45 minutes.
From para 4.1.9 and 4.1.10 it is seen that all predicted
values for module:
MTBF 5000 hours
and
MTTR = 1 hour
and accordingly the requirements regarding R&M testing
are not applicable for the chosen hardware equipment.
However, if any future design change requires implementation
of hardware modules for which R&M testing will apply,
this testing will be conducted in compliance with the
requirements stated in para. 4.1.9-10.
5̲ ̲ ̲R̲M̲A̲ ̲R̲E̲P̲O̲R̲T̲S̲
For each hardware module contained in the CAMPS configuration
a RMA Report will be supplied.
The RMA Report will contain all information used for
calculation of the MTBF figures for each individual
hardware module.
General to the RMA Reports will be a description of
all used Reliability Prediction Algorithms.
For each individual module the related RMA report will
contain reliability data sheets comprising all used
components.
6̲ ̲ ̲F̲A̲I̲L̲U̲R̲E̲ ̲R̲E̲P̲O̲R̲T̲I̲N̲G̲ ̲A̲N̲D̲ ̲C̲O̲N̲T̲R̲O̲L̲
A Failure Reporting System will be established in order
to monitor hardware and software failure occurrences.
The Failure Reporting System will also provide means
for an early detection of possible systematic failures.
Hereby corrective action can be initiated at an early
stage to avoid unneccessary user inconvenience.
A complete description of the planned Failure Reporting
System is given in CAMPS Maintenance Plan CPS/PLN/006.
7̲ ̲ ̲F̲A̲I̲L̲U̲R̲E̲ ̲A̲N̲A̲L̲Y̲S̲I̲S̲
As part of the Failure Reporting System each detected
failure will be analysed.
Software failure analysis is conducted in order to
identify the cause of the failure. In addition to failure
cause identification the analysis shall include a description
of the remedial action to be taken in order to prevent
repeated occurances of the failure.
The hardware failure analysis will be conducted after
each repair of a faulty module.
If the failure has been due to exessive stress of any
specified component parameter or if the observed component
lifetime departs considereably from the specified value,
further analysis must be conducted. The analysis must
result in a suggestion for some remedial action to
prevent future failures due to identical causes.
If the failure analysis concludes that the failure
was due to component wear-out no further action is
taken.
To cater for potential failures an extensive analysis
of the CAMPS hardware equipment has been carried out.
This analysis has resulted in implementation of a number
of hardware- and software controlled tools for on-line
error detections. A detailed description of these features
is given in the SDS para. 4.11.
8̲ ̲ ̲M̲O̲N̲I̲T̲O̲R̲I̲N̲G̲ ̲O̲F̲ ̲S̲O̲F̲T̲W̲A̲R̲E̲ ̲D̲E̲F̲E̲C̲T̲S̲
During the software integration period and the In-Plant
Software Verification the occurrence of Software Defects
will be monitored.
For each detected Software Defect the Category (ref.
SRS para. 4.2.2.2) is determined and logged together
with the accumulated run-time.
For each category of Software Defects the accumulated
number of detected defects vs. the accumulated run-time
is plotted.
It is expected that the plotted curve will appear as
indicated on fig. 8.1.
No. of
detected
defects
̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲
accumulated
run-time
Figure 8-1