top - download
⟦f10fa7d8e⟧ Wang Wps File
Length: 46804 (0xb6d4)
Types: Wang Wps File
Notes: AIR CANADA PROPOSAL
Names: »2058A «
Derivation
└─⟦9a531dff6⟧ Bits:30006257 8" Wang WCS floppy, CR 0157A
└─ ⟦this⟧ »2058A «
WangText
…15……00……00……00……00…/…0a……00……00…/…0b…"…02……0f……09……0f……0a……0f……0b……0f…
…0e……0a……0e……00……0e……05……86…1 …02… …02… …02… …02…
CHAPTER 4
Page #
DOCUMENT III TECHNICAL PROPOSAL Apr. 29, 1982
4.10 N̲o̲d̲a̲l̲ ̲S̲i̲t̲e̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲s̲ ̲a̲n̲d̲ ̲I̲n̲t̲e̲r̲f̲a̲c̲e̲s̲
4.10.1 N̲o̲d̲a̲l̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲O̲p̲e̲r̲a̲t̲i̲o̲n̲s̲
In the ACDN network the NCC supervisor staff is provided
with total control of the entire network. The ACDN
network nodal computers are fully capable of unattended
operation.
However, to provide for local control of a nodal subsystem
and its local external user network, a subset of the
network control and monitoring functions may be exercised
locally at a nodal site operator work position. This
command subset is specified in the command summary,
section 4.6
The functions that will be available from a nodal site
operator position includes the following
o Status awareness functions
(all network status)
o External resource control
(line, concentrator, device and host participants)
o Remote node control
(dumping and diagnosing)
o Local H/W and S/W fault isolation and verification
4.10.2 N̲o̲d̲e̲ ̲O̲p̲e̲r̲a̲t̲o̲r̲ ̲P̲o̲s̲i̲t̲i̲o̲n̲
The Nodal Control Processor is equipped with an engineering
position consisting of a VDU terminal/keyboard and
a hardcopy printer. The work position is described
in section 4.4.4.
4.10.3 O̲p̲e̲r̲a̲t̲o̲r̲ ̲C̲a̲t̲e̲g̲o̲r̲i̲e̲s̲
The Node may be accessed locally by network supervisor
or fixed technician staff subject to the standard ACDN
security rules.
Supervisor staff may exercise overall network control
in coordination with the NCC.
Technicians may perform fault diagnosis functions on
the Nocal Control and switching processors, and exercise
restricted subsets of external network control functions.
4.11 S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲ ̲a̲n̲d̲ ̲R̲e̲p̲o̲r̲t̲i̲n̲g̲
This section describes the facilities for collection
and processing of statistics, reports generation and
cost and billing information. This includes the following
aspects:
o Statistics Collection
o Statistics Processing and Reports
o Billing Data
4.11.1 S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲ ̲C̲o̲l̲l̲e̲c̲t̲i̲o̲n̲
During operation of the ACDN statistical information
is collected as a result of
o a permanent function
o a threshold exceeded on one of the parameters
collected permanently
o a request from a network operator
The statistical data are constantly collected by the
nodes for all resources which have been designated
in a START-STATISTICS command or which are part of
the permanent statistics recording. At regular intervals
or when a threshold is exceeded the statistics records
are time stamped and sent to the NCC where they are
written to the designated files.
The latest statistics record values for each resource
are averaged with the already stored values according
to the formula:
stored-val:= ((n-1) stored-val+new-val)/n where n is
specified at system generation.
ACDN provides an application which is able to print
in readable form the entries of selected resources
from any statistics file.
NMH applications may be written which either processes
the statistics information on-line or off-line to generate
graphics displays at the operator consoles.
Statistics are permanently collected at the NCC, the
nodes, links and the EMH.
An overview of the collected statistics is given on
table 4.11.1-1.
Table 4.11.1-1…01…Statistics Collection Overview
Table 4.11.1-1…01…Statistics Collection Overview (Cont'd)
4.11.2 S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲ ̲P̲r̲o̲c̲e̲s̲s̲i̲n̲g̲ ̲a̲n̲d̲ ̲R̲e̲p̲o̲r̲t̲s̲
The raw statistics are assembled by the NCC according
to table 4.11.1-1. The raw statistics are transferred
to the NMH for data reduction by application once per
day.
The statistics also provide the input to the various
network reports which are available at the NCC - either
daily, hourly and/or on request:
P̲e̲r̲i̲o̲d̲i̲c̲ ̲R̲e̲p̲o̲r̲t̲s̲
o Traffic Statistics Report Daily
o Service Availability Report Hourly
o Host Status Report Hourly, on request
o ARINC/SITA Report Hourly
o Terminal Routing Status Report Hourly
o System Queue Status Report Hourly, on request
o Node Status Report Hourly, on request
R̲e̲p̲o̲r̲t̲s̲ ̲o̲n̲ ̲R̲e̲q̲u̲e̲s̲t̲
o Total Message Statistics Report
o Terminal Response Time Report
o Terminal/Line Status Report
4.11.2.1 T̲r̲a̲f̲f̲i̲c̲ ̲S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated daily for a 24 hour period
and includes the traffic statistics for all terminals
connected to the ACDN network and the host to host
traffic. Generally the total number of messages/transactions
and the total number of bytes transferred per device
are given. The following information is presented:
1) Per terminal the following information is printed:
o Terminal identification
o Terminal type/location
o Statistics are grouped according to the following
hierarcy:
traffic traffic
d̲i̲r̲e̲c̲t̲i̲o̲n̲ t̲y̲p̲e̲ ̲ ̲ ̲ s̲t̲a̲t̲i̲s̲t̲i̲c̲s̲
o Figures per category:
Input/ o Type A - Total number
of
msgs./transactions
Output o Type B - Total number
of
bytes
- TDP - Peak Hour value
- PMS - Peak hour
2) Per Host the following information is printed:
traffic Destination Statistical
d̲i̲r̲e̲c̲t̲i̲o̲n̲ H̲o̲s̲t̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ F̲i̲g̲u̲r̲e̲ ̲ ̲ ̲ ̲ ̲
Input/Output Host Id - Total number of
transfers
- Total number of
bytes
- Peak Hour values
- Peak Hour
4.11.2.2 S̲e̲r̲v̲i̲c̲e̲ ̲A̲v̲a̲i̲l̲a̲b̲i̲l̲i̲t̲y̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and gives a total overview
of the availability of the services provided by the
network. A network service is an application in one
of the hosts connected to the network to which user
may sign-in.
The following informations are presented in the report:
Per Host per application the following informations
are listed:
o Host id
o Application id
o Statistics Figures
- No. of sign-ins during the last hour
- Total daily no. of sign-in
- No. of "application not avail" during the last
hour
- Total daily no. of application not available
4.11.2.3 H̲o̲s̲t̲ ̲S̲t̲a̲t̲u̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and on request to give
an overview of the status of the hosts and their applications.
The following informations are presented in the report:
Per Host the following informations are listed:
o Host id
o Host status (active, testmode, backup, offline)
o List of application status (service status)
- Service-id
- Service status (available, not available)
4.11.2.4 A̲R̲I̲N̲C̲/̲S̲I̲T̲A̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and provides a report
of the message traffic to/from the ARINC/SITA network.
The traffic is distributed on airline codes.
The following informations are presented in the report:
Per airline code the following informations are listed:
o Airline id
o No. of messages to ARINC/SITA for the last hour
o No. of messages from ARINC/SITA for the last hour.
4.11.2.5 T̲e̲r̲m̲i̲n̲a̲l̲ ̲R̲o̲u̲t̲i̲n̲g̲ ̲S̲t̲a̲t̲u̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and provides information
of all printer terminals that are currently using alternate
routing or duplicate delivery.
The following informations are presented in the report:
Per terminal the following informations are listed:
o Terminal id
o Terminal id of alternate or duplicate
o Routing type (alternate, duplicate)
o Applicable priority levels.
4.11.2.6 S̲y̲s̲t̲e̲m̲ ̲Q̲u̲e̲u̲e̲ ̲S̲t̲a̲t̲u̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and on request and
gives a summary of all undelivered messages - PMS,
type B to the dedicated printers in the ACDN.
When the report is printed on request it may be printed
for specified terminals only or for all printers.
Per terminal the followinfg informations are listed:
o Terminal id
o Queue lengths for
- priority 0
- priority 1
- priority 2
- priority 3
- priority 4
- priority 5
- total queue length
4.11.2.7 N̲o̲d̲e̲ ̲S̲t̲a̲t̲u̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated hourly and on request. - It
gives the Node status for each node in the ACDN.
The following informations are given in the report.
Per node the following informations are listed:
o Node Id
o Node Equipment status
- PU #0 ....PU#3 status (online, standby, offline)
- Link status, per LTU a status (online, standby,
offline)
- Host I/F status (online, standby, offline)
- Disc Status (Dual, Single)
o Traffic statistics
- No. of conversations
- No. of conver. establishments/terminations
- No. of packets total
- node entry
- node exit
- switched
- Buffer and disc occupancy
Per link the following informations are listed:
o Link-Id
o No. of virtual connections
o No. of packets per sec.
- discarded
- received
- sent
- data
- control
o Packet size distribution
o Average packet queue time
o No. of link protocol failures
o Frame retransmissions
o Physical link failures.
4.11.2.8 T̲o̲t̲a̲l̲ ̲M̲e̲s̲s̲a̲g̲e̲ ̲S̲t̲a̲t̲i̲s̲t̲i̲c̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated on request and summarizes
the total message traffic figures for the entire network.
The figures are given for the previous hour and the
previous 24 hour period. The following informations
are given in the report:
o Type B traffic statistics
- Total no. of messages in (24 hour period)
- Total no. of messages out (24 hour period)
- Total no. of message in (previous hour)
- Total no. of messages out (previous hour)
- Total no. of intercepted messages (previous
hour)
- Average no. of characters (24 hour period)
o Type A traffic statistics
- Total no. of transactions in (24 hour period)
- Total no. of transactions out (24 hour period)
- Total no. of transaction in (previous hour)
- Total no. of transactions out (previous hour)
- Average no. of characters (24 hour period)
4.11.2.9 T̲e̲r̲m̲i̲n̲a̲l̲ ̲R̲e̲s̲p̲o̲n̲s̲e̲ ̲T̲i̲m̲e̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated on request and the report
may include response times for once or more specified
terminals.
The following informations are given in the report:
o Terminal id
o Percentage of messages having response time:
- 0 - 2.5 sec.
- 2.5 - 5 sec.
- 5 - 10 sec.
- 10 - 20 sec.
- more than 20 sec.
o Total no. of messages
4.11.2.10 T̲e̲r̲m̲i̲n̲a̲l̲/̲L̲i̲n̲e̲ ̲S̲t̲a̲t̲u̲s̲ ̲R̲e̲p̲o̲r̲t̲
This report is generated on request. It lists the status
of all the terminal access lines and all the terminals
in the network.
The following informations are listed:
Per terminal access line:
o Line id
o Line status (up, down, test mode)
o Terminal status list (per terminal line)
- Terminal id
- prim./sec. line indicator
- Device status (up/down Test)
4.11.3 B̲i̲l̲l̲i̲n̲g̲ ̲D̲a̲t̲a̲
Cost and billing is performed on basis of the following
data compiled per session established:
I̲T̲E̲M̲ D̲A̲T̲A̲ F̲u̲n̲c̲t̲i̲o̲n̲
A Subscriber Id (department) Selection
Key
B User Id Selection
Key
C Terminal Id Charge Key
D Service Id Charge Key
E Priority Charge Key
F NO. of bytes transm. - Type A Charge Key
G No. of bytes transm. - Type B Charge Key
H Time of day (start of Session Charge Key
-
Function
I Duration of session Charge Key
J Service availability Charge Key
The cost and billing informations are ordered according
to the "selection keys": Department/User. Thus it is
possible to provide a total cost figure per subscriber
(department/user) and to distribute the costs per user.
- If applicable the costs may be distributed according
to the "charge keys".
The cost per session is calculated as a function of
the "charge keys" (e.g. weighted sum). - Note that
the "charge key" factors may depend on the time of
day (higher costs during busy hours).
The EMH Statistics (section 4.11.1) provides for the
necessary billing information concerning the interfaces
to other networks. Also included in the EMH statistics
is the message distribution on airline code.
4.12 U̲s̲e̲r̲ ̲C̲o̲n̲t̲r̲o̲l̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
Users of the ACDN network services have at their disposal
the following categories of direct user control/command
functions:
o Sign-in/sign-off
o Status awareness commands
o TDP printer control commands
o Type B traffic commands
o Operator communication functions
These command groups are described in the following
sections.
4.12.1 S̲i̲g̲n̲-̲i̲n̲/̲s̲i̲g̲n̲-̲o̲f̲f̲
When signing-in the user may specify one or multiple
hosts, and in the case of multiple hosts, the presently
used host selection prefix may be employed.
However, the user optionally can sign-in in "automatic"
mode, where he has softkey based access to an entire
command tree structure.
The commands are the "leaves" of the tree. It typically
takes only one soft-key stroke to move up or down one
"branch-level" in the tree. Similarly, to select a
certain command only requires the depressing of a single
key.
In this mode the CRT displays at any given moment,
the current function name for each soft-key (in other
words, the function names on current level).
The present Air Canada terminals may not be able to
support soft-keys, but as the terminal equipment gets
exchanged eventually, this will be a possibility.
This structure may be defined independently of the
actual distribution of the corresponding applications
in hosts, although experience shows that the tree structure
would most often have an entire subtree taken care
of by a single host computer.
The user can hence concentrate on selecting the right
commands with the right parameters. These commands
will then be directed to the corresponding host and
within each host, to the corresponding applicaton.
The establishment of this total command tree also brings
forth the overview of all commands and illustrates
whether the present structure and the subtree and command
names chosen are adequate.
4.12.2 U̲s̲e̲r̲ ̲S̲t̲a̲t̲u̲s̲ ̲A̲w̲a̲r̲e̲n̲e̲s̲s̲
Terminal users will be able to request their actual
connection status concerning the terminal and printer
condition and the active user sessions.
Further, for the purpose of fault isolation and verification
a number of report display functions are available
to users/technicians at an ordinary user terminal.
The status request functions will display the status
of any specified ACDN terminal at the originating terminal
position.
Commands:
CONNECTION-STATUS term ̲id
This command is prompted by a response indicating the
following status information for the specified terminal
(if term ̲id is omitted, the used terminal is assumed):
- sign-in status (number of parallel sessions, session-ids,
names of connected applications, sign-in elapse
times)
- terminal condition status
- printer status condition
- status of physical and logical paths to destination
(multiplexer, remote lines, concentrator including
concentrator components, concentrator lines, nodes
and node components, inter nodal trunks)
- routing status - primary or alternate routing on
access network, routing through backbone network
to host/application.
DISPLAY-NETWORK-ERRORS (parameters)
Several response reports may result as controlled through
parameterized options:
The first type of report that can be generated displays
transactions and errors for each of the multiplexors
attached to a specified remote controller on the concentrator
(ICC).
A further interrogation can be made by adding the multiplexor
number in the command. The report will then list errors
for each terminal device controlled by the specified
multiplexor.
A finer breakdown can be displayed by identifying the
terminal in the request.
The following status indications are included in the
display per ICC/MUX and terminal:
- input errors
- output errors
- total errors
- error percentage
- total traffic
DISPLAY-TDP-STATUS device-id
The current status of the specified TDP printer is
displayed, including
- device status
- configuration information
- error statistics
- message queue length
4.12.3 T̲D̲P̲ ̲P̲r̲i̲n̲t̲e̲r̲ ̲a̲n̲d̲ ̲T̲y̲p̲e̲ ̲B̲ ̲T̲r̲a̲f̲f̲i̲c̲ ̲c̲o̲n̲t̲r̲o̲l̲s̲
TDP printers and the type B traffic for these printers
may be controlled from user terminal. The comands are
described in section 4.8.10
4.12.4 P̲M̲S̲ ̲R̲e̲t̲r̲i̲e̲v̲a̲l̲ ̲C̲o̲m̲m̲a̲n̲d̲s̲
Users may request Type B messages to be repeated (redelivered).
With the exception of TDP traffic it is possible to
retrieve messages previously delivered within the past
18-24 hours. These messages may be delivered to the
original destination or to abother specified terminal.
A maximum retrieval of messages at one time is allowed.
These functions are further described in section 4.9
(EMH).
Commands:
RETRIEVE term-id nnnn nnnn ddhhmm (term-id)
Retrieves a sequence of messages between two output
message sequence numbers, which were originally delivered
to the first terminal indicated on the specified day,
and delivers them to the second terminal specified
or to the original recipient of a terminal is not specified.
A maximum of 16 messages may be retrieved at one time.
The date/time should correspond to the time-of-delivery
for the first message +̲ 30 minutes.
RETRIEVE term-id ddhhmm ddhhmm (term-id)
Retrieves the messages that were delivered between
the listed delivery times to the first terminal indicated,
and delivers them to the second terminal specified
or to the original recipient of a terminal is not designated.
A maximum of 16 messages may be retrieved at a time.
RETRIEVE term-id nnnn ddhhmm (term-id)
Retrieves by output message sequence number (OMSN)
and time-of-delivery the message originally delivered
to the first terminal indicated, and delivers it to
the second terminal specified or to the original recipient
if a terminal is not specified.
RET-USER-LOG term-id1 ddhhmm ddhhmm (term-id 2)
Retrieves input logs between specified times-of-receipt
for the first terminal indicated, and delivers them
to the second terminal indicated or to a supervisor
position if no terminal is specified.
RET-OUT-LOG term-id1 ddhhmm ddhhmm (term-id2).
Retrieves output logs between specified time-of-delivery
for the first terminal indicated, and delivers them
to the second terminal indicated or to a supervisor
position if no terminal is specified.
HALT-RETRIEVE nnn
where nnn is specified retrieval request to be halted.
Halts the retrieval command specified.
Tape function commands control the journalling of statistics
and the drain and re-entry of messages to and from
magnetic tape storage.
Commands:
DRAIN line-id
Drains to magnetic tape all messages in queue for the
specified line or for all lines if no line is specified.
HALT-DRAIN
Halts the traffic drain to tape currently in progress,
REENTRY (line) mmddhhmm
Initiates the re-entry from tape of all messages previously
drained from a line or all lines in the system.
Where mmddhhmm is date/time the tape was created (month,
day, hour, minute); used to locate a particular file
on the tape.
4.12.5 T̲e̲r̲m̲i̲n̲a̲l̲ ̲t̲o̲ ̲T̲e̲r̲m̲i̲n̲a̲l̲ ̲C̲o̲m̲m̲u̲n̲i̲c̲a̲t̲i̲o̲n̲
ACDN terminal users may communicate through terminal
to terminal info-messages.
Command
INFO term-id text
The specified term-id will receive the message as if
it was originated by an operator (broadcast).
Instead of specifying a user terminal, the INFO may
also be destined for the network supervisor, in which
case the incoming INFO will be handled as a notice
in the NCC event and alarm system.
4.13 T̲e̲s̲t̲i̲n̲g̲
The testing facilities incorporated in the proposed
network falls into two categories:
1. Fault detection and fault isolation on an existing
possibly live system (H/W and S/W). The online
test and trace facilities support primarily the
NCC personnel in maintaining the live system configuration,
fault detection and isolation errors when they
occur, and in checkout of a restored configuration.
These facilities are described in the following
subsections.
2. Test and integration of new facilities for later
inclusion in the live system, hardware or software.
4.13.1 F̲a̲u̲l̲t̲ ̲D̲e̲t̲e̲c̲t̲i̲o̲n̲ ̲a̲n̲d̲ ̲I̲s̲o̲l̲a̲t̲i̲o̲n̲ ̲A̲i̲d̲s̲
During network initialization and upon node restart
the NCC initiates test connections to verify proper
functioning of the software. The testdata back to
the NCC will run on the permanent virtual circuits
which are established during initialisation between
the nodes and the NCC.
The tests can also be performed with the help of NCC
application programs. The NCC provides software tools
to test both hardware and software modules (i.e. local
and remote software modules).
To facilitate hardware/software trouble-shooting monitoring
facilities will be provided. These facilities are not
the same as those used for monitoring the ACDN in an
operational sense.
The test monitoring facilities give a possibility to
display:
o all input/output from/to hosts
o all input/output from/to nodes
o all traffic on internodal trunks
where this does not create an overload situation. All
buffers can be displayed on request. Specified memory
buffers can be displayed on request. Specified memory
area can be displayed or stored on a disc file for
later interpretation.
A logging facility described in section 4.8.6 is provided
to aid the field technician in keeping track of events.
A bounce facility is provided. This facility bounces
messages from a connected terminal back to the originator.
Traces are activated either during network initialization
or as a result of an operator command, and deactivated
as a result of an operator command or as a result of
system close down.
It is possible at the NCC to test the routing strategy
by sending a packet to an end user and mark this packet
with node idenficator when it passes the node. Software
located in the nodes are able to recognize trace type
packets.
The traceable entities are: virtual calls, trunks,
ter- minals, conversations.
4.13.1.1 T̲e̲s̲t̲ ̲F̲u̲n̲c̲t̲i̲o̲n̲s̲
The following test functions are directly available
as NCC application programs:
o Buffer read-out from NCC, EMH and nodes.
o Remote dump of EMG and nodes
o Loop back at Link level
o Sessions with message generators.
Commands:
NCC-id
READ-MEMORY EMH-id start-addr length file
Node-id
This command will transfer the designated memory area
to the NCC issuing the command and store it in file
(or display it directly if file is not specified).
This is done without corrupting the operation of the
target computer.
NCC-id
DUMP EMH-id start-addr length file
Node-id
This command is similar to READ-MEMORY except that
the
target computer must be in its basic state.
The bounce facility is provided in two versions:
i) The user may switch his terminal in control mode
and issue the command:
BOUNCE text
This command is reflected by the user services
in the NCC and reappears on the users terminal.
ii) The user may logon to the bounce application in
the NCC which also reflects all text received.
LTU
LOOP node-id link-id LOCAL
REMOTE
The designated node is instructed to perform a loop
test on the designated link. The loop is either performed
internally in the LTU, at the local modem, or at the
remote modem. The positive or negative result of the
loop is returned to the NCC.
The loop test can only be performed when the link is
not included in the rounting strategy.
INITIATE-SESSION message-generator message-receiver
This command causes the pseudo terminal (message-
generator) to start sending messages to the NCC-application
(message-receiver) based on traffic request sent in
other direction.
The message have time stamp to allow the message receiver
to determine the delay. Other effects may be observed
via the normal status displays.
4.13.1.2 T̲r̲a̲c̲e̲ ̲F̲a̲c̲i̲l̲i̲t̲i̲e̲s̲
Traces of protocol data units can be performed throughout
the network, but may cause a severe performance degradation.
A copy of each data unit being traced is sent to the
NCC on separate conversations with low priority. Thus
response time during normal load will only be slightly
affected, but throughput will be decreased due to the
accumulation of trace information being sent to the
NCC from the Node or EMH performing the trace.
Commands:
TRACE-ON id trace-object trace-output-file where trace-object
may be: host
terminal
link
VC
conversation
session
All trace data are sent to the trace-output file for
later analysis.
TRACE-OFF id
This command turns the designated trace off.
Note: Several traces may be active simultaneously.
4.13.2 T̲e̲s̲t̲ ̲a̲n̲d̲ ̲I̲n̲t̲e̲g̲r̲a̲t̲i̲o̲n̲ ̲o̲f̲ ̲n̲e̲w̲ ̲f̲a̲c̲i̲l̲i̲t̲i̲e̲s̲
The software development facilities offered are described
in section 4.7.
They include a Volume Test Generator (ATES) and software
development tools, which can be used on the EMH and
NMH computer-equipments.
A number of mutually exclusive test environments may
be specified at the NMH and kept on disk.
These test environments may be specified as isolated
subsets of the live system, or as separated equipment
subsets using back-up equipment for this latter case
however, it should be noticed that there is only one
redundant PU for each N PU's of the same function,
which sets the testing limits in each site configuration
of these "type II" test setups.
When desired a test manager can have a test scheduled
at a given time.
The requested configuration of a Closed (Test) User
Group will take place and any automatic test drive
functions (e.g. Message Volume Generation) will commence.
Test operators at operator positions will be notified
about the test configuration completion and may hence
commence their test procedures.
Correspondingly, test reports can to a large extent
be generated automatically. The sum of event logging
facilities available in the operational system and
in the Test driving facilities (ATES and the online
test logging utilities) allow the test conductor very
flexible means for reduction of the resulting data
since most events can be classified for logging or
not, by a number of parameters, such as source, cause,
destination, type, time stamp, number of occurrences,
or trigger events.
This "selective test looking-glass" allows the test
conductor to successively walk through the test procedure
and scrutinize for each test run and test step only
the relevant test data.
Outputs from the various test runs are identified and
filed individually for each run, and hence gives a
quick backward reference to earlier test runs.
4.14 R̲e̲c̲o̲v̲e̲r̲y̲ ̲a̲n̲d̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
To provide a high level of service to the users of
the ACDN redundant equipment is used at vital points
of the network.
This section describes the following aspects:
- Redundancy Operation
- Recovery after Failures
- Degraded Mode Operation
4.14.1 R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
This subsection describes the redundancy aspects. This
can be summarized as being:
- H/W Redundancy
- Redundant Operation, Checkpointing
- Switching Redundant elements
Finally the geographical NCC backup concept is described.
4.14.1.1 H̲/̲W̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
The following subsections describe specific hardware
redundancy at the various AIR CANADA NETWORK components.
Generally supra buses are dualized. Trunks to nodes
and between nodes are multiple to allow a software
vice redundancy.
4.14.1.1.1 N̲o̲d̲e̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
In each node one standby PU exists. It can substitute
any of the remaining PUs.
The node disks are handled as mirrored, i.e. updated
concurrently.
4.14.1.1.2 E̲M̲H̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
The EMH contains an active and a standby PU.
The EMH disks are handled as mirrored, i.e. they are
updated simultaneously.
N̲M̲H̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
The NMH contains an active and a standby PU.
The operation is cold standby.
N̲C̲C̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲c̲y̲
The NCC is geographically backed up, and local dualized.
4.14.1.2 C̲h̲e̲c̲k̲p̲o̲i̲n̲t̲i̲n̲g̲
A checkpoint records the state of an activity e.g.
a session or a message.
At specific events e.g. sign on the active PUs transmits
checkpoints to the standby PU memory via the supra
bus or to a disk in order to provide an acceptable
level of recovery at the time of restart.
4.14.1.2.1 R̲e̲c̲o̲v̲e̲r̲y̲/̲R̲e̲s̲t̲a̲r̲t̲
Restart refers to the actions to bring up a former
active or a standby PU as active i.e. reestablishes
the dynamic behaviour of the system.
Recovery refers to the reestablishment of continuity
in memory and backing storage contents during a restart.
4.14.1.2.2 R̲e̲c̲o̲v̲e̲r̲y̲ ̲L̲e̲v̲e̲l̲
The contents of checkpoints to disk or to a standby
PU defines the recovery level.
The checkpointing to disk is used to handle the total
system failure case.
The proposed checkpoint method is managed by a dedicated
checkpoint process and the checkpoint media (disc,
PU or other)may be selected optionally. Both disc and
PU may be used simultaneously. Apart from the real
time aspect-(application will have to be halted until
checkpoint is safely stored in the checkpoint media)-
the checkpoint media is fully transparent to the applications.
Using standby PU as checkpoint media provides for:
- fast checkpointing, and thus a finer recovery level
may be possible
- fast switchover as no disc access involved in recovery
(Less than 1 minute)
Using disc as checkpoint media gives the following
advantages in relation to PU checkpointing:
- checkpoints are stored on a non-volatile storage
(dual disc storage)
- almost unlimited space for checkpoints is provided
(saves processing in the recovery phase).
Checkpoint method and recovery level should be given
at the system generation time and is not subject for
online switching in the proposed solution. However,
also this feature may be accomplished easily with the
existing design.
The proposed checkpoint method and recovery level is
the same for all the generic elements. Node, EMH, NCC,
FEP's.
Transaction Checkpoint Recovery
t̲y̲p̲e̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲m̲e̲d̲i̲a̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲ ̲l̲e̲v̲e̲l̲ ̲ ̲ ̲ ̲
session s disk full recovery
line/equipment disk full recovery
status
network control disk full recovery
information
- statistics
- commands
- updates
Type A traffic none to session level
inquiry/response
to be repeated
Printer traffic, disk to message level
Type B traffic, printing/delivery
protected message to start from
traffic beginning of
interrupted
message
host to host bulk none none
traffic
4.14.1.3 S̲w̲i̲t̲c̲h̲i̲n̲g̲ ̲R̲e̲d̲u̲n̲d̲a̲n̲t̲ ̲E̲l̲e̲m̲e̲n̲t̲s̲
The process of switching between redundant elements
involves several aspects depending on the redundancy
method. The important operations are the following:
- Initialization of the components to be active or
standby
- Switching a standby element to become active. -
This process is defined as recovery.
- Inclusion (on-line) of a repaired element, to become
standby or active.
These aspects will be discussed in the following for:
- The 1 out of N PU redundancy i.e. 1 standby to
N active elements.
- The mirrored disc's
- The hot redundant supra bus
4.14.1.3.1 P̲U̲ ̲S̲w̲i̲t̲c̲h̲o̲v̲e̲r̲
A PU switchover implies that:
- the current active PU is electrically disconnected
from its peripherals.
- the standby PU is activated and via its separate
databus it can access the peripherals of the former
active PU.
The decision for a PU switchover may be:
- the watchdog having monitored a PU error via the
configuration control bus or the non-arrival of
a "keep alive" message, which the PUs periodically
(second basis) sends to the watchdog
- the PU having monitored an internal irrecoverable
hardware or software error.
The decision may also be forced by the system operator
command:
SWITCHOVER PU#
When the standby PU has become active it starts the
recovery process which based on the checkpoints will
reestablish the PU operation and provide garble correction.
The recovery time depends mostly on the checkpoint
media (disc) but normal operation will resume no later
than 60 seconds after switchover.
Online inclusion of a repaired PU can be accomplished
by the system operator.
command:
INCLUDE PU number
This command will load and initialize the PU to standby
state and include the PU in the configuration as an
active element.
4.14.1.3.2 M̲i̲r̲r̲o̲r̲e̲d̲ ̲D̲i̲s̲c̲ ̲H̲a̲n̲d̲l̲i̲n̲g̲
Mirrored discs are hardwarewise 2 separate devices
with separate controllers.
A mirrored dual disc system is a system where the discs
are exact copies of each other. Thus writing takes
place on both disc devices simultaneously at the same
sector locations. Reading takes normally place on one
disc, but in case of errors the backup disc is used.
Read- or write errors on a disc device will result
in an automatic discard for subsequent access to the
failed device. This discard is handled by the file
management system
Apart from logging the "discard event" on the operator's
console no fixup action is required by the application
S/W which means that there will be no impact on messages
or other transactions in progress and the "discard"
actions being handled by the file management system
alone will take place immediately.
After "discard" (which may also be forced by the operator
commands: DISCARD) the disc device may be removed for
repair.
Reinstalling a repaired or spare disc device may take
place online.
The reinstalled disc will have to go through a sector
to sector copy from the active disc before a dual disc
reassignment may take place. This is accomplished by
the operator command INCLUDE DISC#. The dual disc reassignment
does only affect the file management system, thus there
is no impact on application processes.
The copy function will be background processing in
relation to application transactions. Thus the time
needed will depent on the actual disc load and of course
the disc capacity (size). Performing the action without
application load will for a 80mb device take approx.
5 minutes.
4.14.1.3.3 S̲u̲p̲r̲a̲ ̲B̲u̲s̲ ̲S̲w̲i̲t̲c̲h̲i̲n̲g̲
The supra bus is hot redundant and switched automatically
by the bus handler upon a serious error condition.
Recovery actions are handled by the supra bus protocols.
Inclusion of a repaired bus requires a system operator
command: INCLUDE SUPRABUS #.
4.14.1.4 G̲e̲o̲g̲r̲a̲p̲h̲i̲c̲a̲l̲ ̲N̲C̲C̲ ̲B̲a̲c̲k̲u̲p̲
Dualized NCC equipment is provided at each node. This
means a very high availability of the NCC. Furthermore
the NCC's geographically provides backup for each other.
- This means that only one NCC is allowed to be active
at a time (only one master in the network) and one
NCC is standby ready to go active.
NCC states:
The NCC may be in one of the following states:
- offline
- Standby
- Active
Offline state means that the NCC is not able to perform
NCC functions.
Standby state means that the NCC is fully operational
ready to take control. All reports and monitoring functions
are active and all network reporting is sent to as
well active as standby NCC's.
Thus the standby NCC "listens" to the network and keeps
an up-to-date picture of the network status, but is
unable to do any kind of network control.
The active state means that the NCC is able to do as
well as monitoring as control function on the network.
In order to ensure the network control integrity the
NCC's communicate mutually by "inter NCC handshake"
protocol which is an inherent part of the NCC.
The state transitions are controlled by operator commands
and automatically. The state transitions are given
on fig. 4.14.1.4-1 and explained in the following:
- Transition A: This is an automatic transition when
initializing the NCC.
- Transition B: When the "Inter NCC Handshake" NCC
protocol has found out that no other
active NCC is in the network then
the
operator receives a "GO ACTIVE
REQUEST". Having received this
request the "GO ACTIVE COMMAND" can
be executed.
- Transition C. If the operator wants to go standby
this may be accomplished by the "GO
STANDBY COMMAND". This command will
be executed under the following
conditions checked by the INH proto-
col:
1) The remote NCC operator is
requested to go active.
2) The remote NCC operator enters
a
"go active" request within 2
minutes.
3) until the remote NCC has entered
the active mode the local NCC
stays
in the active mode.
4) If the remote NCC does not enter
the active mode the request is
denied by the system.
- Transition D: This transition corresponds to a
"SYSTEM CLOSE DOWN"
The dual geographical backup operation is given on
the state transition diagram fig. 4.14.1.4-2 for dual
NCC's.
The operation is explained by 7 selected transaction
cases:
- Transition 1: - Corresponds to fig. 4.14.1.4-1A
- Transition 2: - Corresponds to fig. 4.14.1.4-1D
- Transition 3: - Corresponds to fig. 4.14.1.4-1B
- Transition 4: - Corresponds to fig. 4.14.1.4-1A
- Transition 5: - This is the NCC switchover process.
This process is also explained
in
fig. 4.14.1.4-1C
- Transition 6: - If both NCC's are in standby state
"GO ACTIVE REQUEST" is issued
to
both system operators. The INH
protocol will handle this
collision problem such that the
quickest operator will win.
- Transition 7: - This transition is only applicable
to the very special situation
where the network has been
separated in 2 parts with each
on
active NCC and these two networks
are connected. The INH protocol
will immediately transfer both
NCC's to standby and issue a "GO
ACTIVE REQUEST" to the operators.
The inclusion of a standby NCC in the network (transition
4 and 6) involves an update of the data bases from
the active NCC to the standby NCC. This update process
is performed in two steps:
- Network status polling
- Bulk transfer of configuration data and statistics.
Fig. III 4.14.1.4-2…01…STATE TRANSITION DUAL NCC
4.14.2 R̲e̲c̲o̲v̲e̲r̲y̲
This section describes the network level recovery in
case of major network failures.
Recovery in ACDN has two aspects, the first is to bring
ACDN itself back to a consistent state, the second
is to synchronize with the external resources.
Synchronization of attachments is relatively simple
since it consists of either:
o Session termination, if the node to which the attachment
is connected has not failed.
o Reinitialization of the local network if the connecting
node failed
Several functions can be used to recover ACDN itself,
depending on the type of failure:
o Host Failure
o Link failure
o Multiple link failure with loss of connectivity
o Node failure
o EMH failure
o NCC failure
4.14.2.1 N̲o̲d̲e̲ ̲F̲a̲i̲l̲u̲r̲e̲
If a node fails, its local network is excluded and
the node is forced to its basic state by a CLEAR command.
In this state, it can be dumped and reloaded remotely.
The restart and following initialization follows the
scheme from network initialization, except the state
of the local network, can be synchronized with the
state stored in the NCC at the time of node failure.
A node failure causes all sessions passing through
the node to be terminated. The node must be restarted
from the NCC. In this state the node will responds
to all connected hosts with a message which causes
the hosts to perform a reinitialization of their network
status. Following this the necessary network configuration
coordination commands may be sent to the hosts to synchronize
their status with the ACDN status.
If the node cannot be forced into its basic state by
the CLEAR command, no remote recovery facilities exist.
4.14.2.2 H̲o̲s̲t̲ ̲A̲c̲c̲e̲s̲s̲ ̲S̲y̲n̲c̲h̲r̲o̲n̲i̲z̲a̲t̲i̲o̲n̲ ̲u̲p̲o̲n̲ ̲N̲o̲d̲e̲ ̲F̲a̲i̲l̲u̲r̲e̲
- Node Failure:
All sessions using the node as an intermediate
node will be routed through an alternative node,
if it exists. All sessions locally attached to
the failing node are lost. The failure is reported
to all hosts via a lost subarea SNA request.
This causes all sessions with logical units attached
to the node to be terminated.
- Trunk Failure:
All sessions using this trunk will be rerouted
through alternative trunks. If anode becomes unreachable
due to multiple trunk failures, the line defined
in VTAM to connect that node to the Host is forced
deactivated. This is provided with the link-inoperative
SNA request.
A failing node or a failing trunk is always reported
back to the network control center. The costs are not
informed if the logical network configuration has not
changed.
4.14.2.3 H̲o̲s̲t̲ ̲A̲c̲c̲e̲s̲s̲ ̲S̲y̲n̲c̲h̲r̲o̲n̲i̲z̲a̲t̲i̲o̲n̲ ̲u̲p̲o̲n̲ ̲H̲o̲s̲t̲ ̲F̲a̲i̲l̲u̲r̲e̲
A host subarea SNA request is sent to the other hosts.
Sessions running between logical units and application
programs in the failing host are lost. Both the VTAM
tables and NCC tables are updated according to this
event.
The list of host application programs located in the
NCC has to be updated. Those application programs belonging
to the failing host are marked "unavailable".
When the host has been brought back to normal operation
and the host system operator has issued a START VTAM
command, s set of initialization commands are sent
to the ACDN. The NCC recognizes these commands and
takes the required actions.
A restart af host failure is very similar to the network
initialization situation.
4.14.2.4 E̲M̲H̲ ̲F̲a̲i̲l̲u̲r̲e̲
A catastrophic failure to the EMH results in loss of
protected message service. The EMH must be restarted
from the NCC. The EMH will be responsible for re-establishing
the content of the protected message data base while
the NCC in cooperation with the node reestablished
the pending connections between the EMH and the PMS
destinations. This reestablishment is like that for
host connections. The EMH is responsible for re-transmitting
those PMS messages for which it has responsibility
and which has not been flagged as delivered to the
network. The nodes automatically request a re-transmission
of PMS messages which the node has been unable to delivery
(note that the EMH does only allow re-transmission
of certain types of PMS traffic, e.g. tickets are not
retrievable).
4.14.2.5 N̲C̲C̲ ̲F̲a̲i̲l̲u̲r̲e̲
A total NCC failure means that no NCC is active in
the network.
The nodes are resilient enough to accept interrupted
NCC service, although no network control and monitoring
can be established in this period. Thus the network
cannot function for a longer period without NCC service.
NCC re-initialization will be necessary in this case.
The NCC will obtain the actual network status by polling
the network after initialization.
Network events in the NCC down period may be lost to
the NCC. This includes session establishment.
4.14.2.6 E̲x̲t̲e̲r̲n̲a̲l̲ ̲R̲e̲s̲o̲u̲r̲c̲e̲ ̲F̲a̲i̲l̲u̲r̲e̲
All sessions with the failed resource are terminated.
If the error is judged to be recoverable, the resource
is kept included and regularly tested to see if the
error conditions have cleared. Otherwise, the resource
is excluded from ACDN. This saves resources and allows
a deeper investigation of the problem through use of
tests.
4.14.2.7 L̲i̲n̲k̲ ̲F̲a̲i̲l̲u̲r̲e̲
A link failure will unconditionally cause all virtual
connections on that link to be cleared. The conversations
carried out by the cleared VC's are suspended. If sufficient
resources are available the conversations will automatically
be resumed via VC's using other links. If resources
are short the low priority conversations will be kept
suspended. It is then the responsibility of the network
operator to resume those conversations when he judges
that sufficient resources are available. This is done
via the RESUME command.
4.14.2.8 M̲u̲l̲t̲i̲p̲l̲e̲ ̲L̲i̲n̲k̲ ̲F̲a̲i̲l̲u̲r̲e̲ ̲w̲i̲t̲h̲ ̲L̲o̲s̲s̲ ̲o̲f̲ ̲C̲o̲n̲n̲e̲c̲t̲i̲v̲i̲t̲y̲
In this case also, the conversations are suspended.
Back up links may be supplied before a RESUME command
is issued, but if this is not possible, the isolated
part of the network must be excluded.
When a node detects the loss of connectivity, the connected
users are informed that sessions are suspended.
Whenever sessions are ready to resume, the connected
users are again informed.