⟦c146917a3⟧

TextFile

.\"##########
.\"#  Big troff text file for workload scripts
.\"##########
.nr Pw 6.75i
.nr Po 0i
.ll \n(Pwu
.ev 2
.ll \n(Pwu
.lt \n(Pwu
.ev
.po -\n(Po
.hy 14
.nr II 0
.de SZ
.ps \\$1
.vs \\$1u*1.25p
..
.de NH
.in 0
.if t .sp 0.5v
.if n .sp
.ne 6v
.SZ 12
.ft B
\\$1 \\$2
.br
.ft P
.SZ 10
..
.de LP
.in \\n(IIu
.if t .sp 0.35v
.if n .sp
.ne 3v
..
.de IP
.LP
.in +4n
.ta 4n
.ti -4n
\\$1	\\c
..
.de PR
\\fI\\$1\\fP\\$2
..
.de FL
\\fB\\$1\\fP\\$2
..
.de SC
.ti +6n
.if t .HS
$ \\$1
.br
.if t .HE
..
.de SV
\\fI$\\$1\\fP\\$2
..
.de RS
.in +6n
..
.de RE
.in -6n
..
.de HS
.ft H
.ps -1
..
.de HE
.ps +1
.ft P
..
.de VA
.IP "\fIShell Variable:\fP \\fB\\$1\\fP  (default: \\$2)"
.br
..
.de TN
.IP "\fITest Name:\fP \\fB\\$1\\fP"
.br
..
\&
.sp 1i
.de Fo
'Hd
..
.de Hd
.ev 2
.bp
.in 0
.sp
.tl 'MUSBUS Introduction''%.'
.sp
.ev
..
.wh -0.1i Fo
.ad c
.SZ 14
.ft B
An Introduction to the Monash Benchmark Suite (MUSBUS)
.if t .sp 0.5v
.if n .sp
.SZ 12
.ft I
Ken J. McDonell
.if t .sp 0.5v
.if n .sp
.SZ 10
.ft R
Department of Computer Science
.br
Monash University
.br
Clayton, AUSTRALIA 3168
.sp
ACSnet: kenj@moncskermit.oz
.br
USENET: seismo!munnari!moncskermit.oz!kenj
.br
ARPA: kenj%moncskermit.oz@seismo.arpa
.sp
Revised: 24 June, 1987
.br
.ad b
.NH 1 Introduction
.LP
The Monash University Suite for Benchmarking
UNIX\v'-.5n'\(dg\v'+.5n'
.de DF
'FN
.rm DF
..
.wh -0.7i DF
.de FN
.ev 2
.in 0
.sp
\\l'1.5i\(ul'
.sp 0.5v
\v'-.5n'\(dg\v'+.5n' UNIX is a trademark of AT&T
.sp 2
.ev
.rm FN
.ch DF -0.01i
..
Systems (MUSBUS), has been developed
to assist in
.IP (a)
identifying bottlenecks and performance problems in new
UNIX ports, and
.IP (b)
providing a robust test environment in which
the performance of competing
UNIX systems may be compared.
.LP
This document provides an overview for \fBVersion\fP \fB5.0 (Beta)\fP of MUSBUS
and is intended for knowledgeable programmers trying to run
the software on their own hardware.
.NH 2 Preliminaries
.NH 2.1 "Software Environment"
.LP
You will require a system that supports Level 7, System V
or BSD
compatibility, along with the following programs.
.RS
.PR sh
(the Bourne shell)
.br
.PR awk
.PR cat
.PR cc
.PR chmod
.PR comm
.PR cp
.PR date
.PR dc
.PR df
.PR echo
.PR ed
.PR expr
.PR kill
.PR ls
.PR make
.PR mkdir
.PR rm
.PR sed
.PR test
.PR time
.PR touch
.PR tty
.PR who
.RE
.NH 2.2 "Getting Started"
.LP
All the files are distributed in a single directory.
Once these have been retrieved from the distribution some initial housekeeping
and system specifics have to be sorted out.
.LP
When fully installed, MUSBUS contains files in several subdirectories
as follows,
.IP \(bu
.FL Results ,
log files created by the command procedure
.PR run .
.IP \(bu
.FL Tmp ,
temporary files created by
.PR run
and friends.
.IP \(bu
.FL Tools ,
post processors to produce
.PR tbl
input from the log files.
.IP \(bu
.FL Workload ,
descriptions of the workload profile, all associated data files and some
work script manipulation tools.
.LP
To explicitly
create these directories and distribute the required files into the
appropriate places you may
.SC "make install"
however this will be done automatically by
.PR run
as required.
.LP
The file 
.FL time.awk
is used by the command procedure
.PR run
to average the
results from several attempts to time a particular test and so
depends upon the format of output from
.PR /bin/time .
The results from multiple timing attempts are held temporarily
in the file 
.FL Tmp/tmp.$$
(where $$ is the pid of the
.PR run
shell).
Try
.SC "/bin/time date"
and check the output from
.PR /bin/time .
If it has a format like
.ti +4n
0.4 real         0.0 user         0.1 sys  
.br
then
.SC "rm -f time.awk"
.SC "ln BSDtime.awk time.awk"
.LP
If the
.PR /bin/time
output looks like
.RS
.ta 8n
.nf
real	0:00.4
user	0:00.0
sys	0:00.1
.fi
.RE
then
.SC "rm -f time.awk"
.SC "ln SysVtime.awk time.awk"
.LP
Otherwise create your own version of
.FL time.awk
using
.FL *time.awk
as examples.
.LP
Some of the tests require system calls from the C code to measure
small elapsed times.
This is a real problem since there appears to be no 
universally correct way of doing this in the Unix
world.
The particular source files are
.FL clock.c ,
.FL fstime.c
and
.FL mem.c .
In the
.FL Makefile ,
ensure that you have \fBone\fP of the following
definitions included in the CFLAGS (in addition to the \(miO).
.RS
.ta 12n
.nf
.if t \fH\s-1\(miDSysV\s+1\fP	you are using a System V brand of Unix
.if n \(miDSysV	you are using a System V brand of Unix
.if t \fH\s-1\(miDBSD4v2\s+1\fP	you are using a Berkeley 4.2 or 4.3 system
.if n \(miDBSD4v2	you are using a Berkeley 4.2 or 4.3 system
.if t \fH\s-1\(miDBSD4v1\s+1\fP	you are using a Berkeley 4.1 system
.if n \(miDBSD4v1	you are using a Berkeley 4.1 system
.fi
.RE
For example,
.ti +6n
.if t \fH\s-1CFLAGS = \(miO \(miDBSD4v2\s+1\fP
.if n CFLAGS = -O -DBSD4v2
.LP
If \fBnone\fP of these systems
is appropriate, the source files \fIwill not compile\fP
and you will have to decide on appropriate alternative calls
and coding to suit local conditions.
.LP
Check the
.FL HISTORY
file (if it exists) for notification of any changes, additions
or problems that may have been made or fixed subsequent to the version
of MUSBUS described in this document.
.LP
Try
.SC "make programs"
to confirm that every necessary program can be compiled and
loaded correctly.
.LP
Now attempt to run all the tests once (this takes roughly 20 minutes).
Using the Bourne shell (\c
.PR /bin/sh ),
.SC "iterations=1"
.SC "nusers=1"
.SC "export iterations nusers"
.SC "./run"
.LP
This should demonstrate
that all the 
.PR sh,
.PR awk,
.PR sed
and 
.PR ed
scripts
can be made to work.
Verification of the health of things to this point depends upon
checking the output from
.PR run
to ensure that no nasty errors
are reported, and in particular scrolling through the file
.FL Results/log .
Every time
.PR run
is used information is \f3appended\fP to
.FL Results/log
and
.FL Results/log.work ,
so make sure that these files are removed or renamed before you start
to do anything serious.
The contents of
.FL Results/log
should look something like
.LP
.RS
.nf
.HS
Start Benchmark Run (MUSBUS Version X.Y)
  Tue Jun 23 17:18:21 EDT 1987 (long iterations 6 times)
  2 interactive users.

Arithmetic Test (type = arithoh): 1000 Iterations
Elapsed Time: 0.44 seconds (variance 0.003)
CPU Time: 0.30 seconds [ 0.30u + 0.00s ] (variance 0.000)

Arithmetic Test (type = register): 1000 Iterations
Elapsed Time: 3.36 seconds (variance 0.008)
CPU Time: 3.18 seconds [ 3.13u + 0.05s ] (variance 0.008)

[ ... and lots more similar goodies ]

Output sent to ... /dev/ttyp0
Directories for temporary files ... Tmp

.nf
.ta 14n,+8n,+8n,+8n,+8n
Filesystem	kbytes	used	avail capacity	Mounted on
/dev/hp0a	\07419	\06202	\0\0475	93%	/
/dev/hp0g	38383	33296	\01248	96%	/usr
/dev/hp1a	\07419	\04169	\02508	62%	/jnk
/dev/hp1b	15583	\03181	10843	23%	/usr/spool
/dev/hp1g	38383	32579	\01965	94%	/mnt
/dev/up1a	\07471	\0\0\010	6713	\00%	/tmp

SIGALRM check:  12 x 5 sec delays takes 60.05 wallclock secs (error -0.08%)
Simulated Multi-user Work Load Test:

1 Concurrent Users, each with Input Keyboard Rate 2 chars / sec
Elapsed Time: 425.83 seconds (variance 0.125)
CPU Time: 27.20 seconds [ 17.30u + 9.90s ] (variance 0.013)

  1 interactive users.
End Benchmark Run (Wed Jun 24 09:33:55 EDT 1987) ....
.HE
.fi
.RE
.LP
Beware of lines with the following formats, they indicate something
is \fBwrong\fP.
.if t .IP "\fH\s-1** Iteration x Failed: text\fP\s+1"
.if n .IP "** Iteration x Failed: text"
.br
Something (\fItext\fP) other than the normally anticipated output from 
.PR /bin/time
was found in the file
.FL Tmp/tmp.$$ .
.if t .IP "\fH\s-1Elapsed Time: -- no measured results!!\fP\s+1";
.if n .IP "Elapsed Time: -- no measured results!!";
.br
Not a single valid timing result was found in
.FL Tmp/tmp.$$ .
.if t .IP "\fH\s-1Terminated during iteration n\fP\s+1"
.if n .IP "Terminated during iteration n"
.br
Premature termination of a test, usually as the result of
a shell trap taken from
.PR run .
Most often this is symptomatic of an earlier
error reported in
.FL Results/log .
.if t .IP "\fH\s-1* Apparent errors from makework ... *\fP\s+1"
.if n .IP "* Apparent errors from makework ... *"
.br
After cleaning the log files from the multi-user test (using 
.PR sed
and the script
.FL check.sed )
some lines remained that probably indicate
real errors which forced the multi-user test to
terminate prematurely.
Depending upon the formats of messages from programs (especially
in the multi-user workload),
.FL check.sed
may need some local fine tuning to remove lines,
that do not reflect genuine error conditions, from
the log files.
If this is not done,
the tests will be aborted prematurely based upon the classification
of a spurious message as a real error condition.
.if t .IP "\fH\s-1Reason?: text\fP\s+1"
.if n .IP "Reason?: text"
.br
.PR Makework
(the controlling program for the multi-user test)
has detected an inconsistency and taken a fatal dive,
\fItext\fP comes from
.PR perror ()
and the previous line in
.FL Results/log
will contain 
.PR makework "'s"
idea of what is wrong.
.if t .IP "\fH\s-1* Benchmark Aborted .... *\fP\s+1"
.if n .IP "* Benchmark Aborted .... *"
.br
Just what it says!
.LP
Other possible error reports in
.FL Results/log
relate to specific tests
and are either self explanatory (e.g. missing or illegal program
options) or described in the Sections below.
.LP
The file
.FL Results/log.work
contains detailed logging of the multi-user test, and may contain
useful information in the event that this test fails or terminates
prematurely.
Besides logging process ids and file descriptor assignments for each
simulated user's job stream, standard error output is trapped and
reported in
.FL Results/log.work .
.NH 3 "The Tests"
.LP
If you are serious about the results produced, these tests should
be run on a dedicated system without concurrent activity.
When possible, an idle system in mult-iuser mode is
preferable to a single user system.
.LP
All the tests are controlled by shell variables used within the command
procedure
.PR run .
By setting environment variables of the same name, the default values
of the shell variables
may be over-ridden, however if the defaults are consistently wrong for
particular variables it is safer (i.e. less error prone) to modify
the defaults in
.PR run .
.LP
.PR Run
does its work for the most part silently, logging information to
certain files, and providing a terse summary of the particular test(s)
being run on the tty from which
.PR run
was invoked.
.LP
A designated test may by run using the command
.SC "./run thing"
where \fIthing\fP is one of the test names described in the following
Sections.
The commands
.SC "./run"
or
.SC "./run all"
will run everything.
.LP
.PR Run
may be interrupted from the keyboard (SIGINT)
if it is started in foreground
and after some fooling about it manages to shut things down and clean up files.
.PR run
creates a 
.PR sh
command procedure
.FL Tmp/kill_run
that may be used to shut down a background
.PR run
via
.SC "Tmp/kill_run"
.VA iterations 6
Unless otherwise stated,
this variable controls the number of times each test is repeated for timing.
At the beginning of each iteration, the program
.FL iamalive
writes the iteration
number (without newline or carriage return)
on standard output.
.NH 3.1 "Raw Speed Measures"
.NH 3.1.1 "Specific Arithmetic"
.LP
This family of tests computes the sum of a series of terms
such that the arithmetic is unbiased towards operator type
(i.e. equal numbers of additions, subtractions, multiplications
and divisions).
Each major loop in the computation involves summing 100 terms
of the series.
.VA arithloop 1000
Number of major loops in the computation.
.TN arithoh
Do not compute the series, so measures the overhead in the
computation.
.TN register
Arithmetic uses registers.
.TN short
Arithmetic uses shorts.
.TN int
Arithmetic uses ints.
.TN long
Arithmetic uses longs.
.TN float
Arithmetic uses floats.
.TN double
Arithmetic uses doubles.
.LP
After all the arithmetic tests have been performed, the 
.PR sh
script
.FL Tools/Adjust
should be used with
.FL Results/log ,
i.e.
.SC "./Tools/Adjust Results/log"
to compute the \fBactual\fP
CPU and elapsed times when the overhead measured by the test
.PR arithoh
is subtracted.
It is these times (i.e. \fIminus the startup and loop overhead\fP)
that have been published and circulated amongst MUSBUS users.
Failure to run the 
.PR Tools/Adjust
script will make the machine
you are testing look comparatively worse
than it really is!
Note that
.PR Tools/Adjust
will be run automatically by the log file postprocessors (\c
.PR Tools/mktbl
and
.PR Tools/mkcomp )
if the times have not already been adjusted.
Once the adjustment has been made, the relevant portion of
.FL Results/log
should look something like (note \fBActual\fP times in parentheses),
.LP
.RS
.nf
.HS
Start Benchmark Run (MUSBUS Version X.Y)
  Tue Jun 23 17:18:21 EDT 1987 (long iterations 6 times)
  2 interactive users.

Arithmetic Test (type = arithoh): 1000 Iterations
Elapsed Time: 0.44 seconds (variance 0.003)
CPU Time: 0.30 seconds [ 0.30u + 0.00s ] (variance 0.000)

Arithmetic Test (type = register): 1000 Iterations
Elapsed Time: 3.36 seconds (variance 0.008) (Actual: 2.92 )
CPU Time: 3.18 seconds [ 3.13u + 0.05s ] (variance 0.008) (Actual: 2.88 )

[ ... and lots more similar goodies ]

.HE
.fi
.RE
.NH 3.1.2 "General Purpose Arithmetic"
.TN dc
Compute the square root of 2 to 99 decimal places using 
.PR dc .
The 
.PR dc
input is in
.FL dc.dat .
This test is due to John Lions (University of New South Wales) who
has suggested it as a good first order measure of raw system speed.
.NH 3.1.3 Recursion
.VA ndisk 17
.TN hanoi
A recursive solution to the classical Tower of Hanoi problem.
Work increases as 2**(number of disks).
.SV ndisk
provides a \fIlist\fP of the number of disks for a
\fBset\fP of problems, however the default setting is for a singular
set.
.NH 3.1.4 "System Calls, Pipes, Forks, Execs and Context Switches"
.VA ncall 4000
.TN syscall
Sit in a hard loop of
.SV ncall
iterations, making 5 system calls
per iteration.
The system calls (\c
.PR dup (0),
.PR close (i),
.PR getpid (),
.PR getuid ()
and
.PR umask (i))
involve little work on the part of the UNIX kernel, so
the test predominantly measures the overhead associated with
the system call mechanism.
.VA io 2048
.TN pipe
One process (therefore no context switching) that writes and reads
a 512 byte block along a pipe
.SV io
times.
.VA children 100
.TN spawn
Simply repeat
.SV children
times; fork a copy of yourself and wait for
the child process to exit (which it should do immediately).
.VA nexecl 100
.TN execl
Perform
.SV nexecl
execs using 
.PR execl ().
The program to be exec'd has been artificially expanded to a reasonable
size (on a VAX, 11264 text + 2048 data + 24388 bss).
.VA switch1 500
.TN context1
Perform 2 x
.SV switch1
context switches, using pipes for synchronization.
The test involves 2 processes connected via 2 pipes.
One process writes then reads a 4-byte (descending) sequence number, while
the other process reads then writes a sequence number.
Synchronization is validated at each swap by checking the values of the
sequence numbers read and written.
.NH 3.1.5 "C Compilation and Loading"
.TN C
Measure the time for each of
.SC "cc -c cctest.c"
.SC "cc cctest.o"
where 
.FL cctest.c
contains 124 lines of uninteresting C code (108
lines of real code after 
.PR cpp ).
.NH 3.1.6 "Memory Access Speed"
.LP
These tests try to measure read accesses per real second into an array
of integers.
Because of inaccuracies in measuring small real times, the
results of this test are subject to large variances and
can not be interpreted with great confidence (e.g. negative
and infinite speeds have been observed).
Consequently, these tests are best considered as a
historical curiosities from the days when MMUs were bottlenecks on
microporcessor-based systems, and \fBno\fP real significance should
be attached to the observed times.
.VA poke 100000
Number of array accesses.
.VA arrays "8 64 512"
List of array sizes in units of 1024 ints.
.TN seqmem
Cyclic sequential access pattern, hitting each element of the array
in turn.
.TN randmem
Random access patterns -- to give VM systems a chance to do something
better!
.NH 3.1.7 "Filesystem Throughput"
.VA blocks "62 125 250 500"
A list of file sizes in Kbytes.
.VA where .
The directory in which the files will be created.
This test requires at least
2 x max(\c
.SV blocks )
Kbytes of free space in the filesystem
containing
.SV where .
.TN fstime
This program attempts to measure file write time, file read time and
file copy time.
It is assumed that BUFSIZ as defined in <stdio.h> is a good size for
physical i/o, and all i/o is done via direct calls to 
.PR read ()
and 
.PR write ().
This test is performed (\c
.SV iterations /2)
times.
.LP
Beware of the \fIwrite\fP time, since this can be influenced by the size
of the disk block cache in the kernel.
Before the reads are commenced there are a couple of 
.PR sync ()s
and a 5 second sleep to try and flush the cache.
The times for small files are most sensitive to disk block
caching.
.LP
Really the \fIcopy\fP time for the largest file
is the best indicator of throughput and reflects
the type of disk activity most commonly generated by compilers, editors,
assemblers, etc.
Also the rates are measured against elapsed time, so there is some
scope for variance however the absolute times are usually long enough
to make this effect insignificant \fBprovided\fP
there is no concurrent disk activity on the same spindle!
.NH 3.2 "Emulated Multi-user Test"
.VA nusers "1 4 8 16 24 32"
A list of the number users to be emulated.
.VA ttys /dev/tty
A \fBlist\fP of tty devices
where the simulated tty output is sent -- there is a lot of this, and
you should ensure that these tty lines are operating at the normal
baud rate (e.g. 9600) for the test system.
If your CPU console does not use a standard serial multiplexer
(e.g. a VAX, Pyramid, Gould, DG, etc.), then the tty output
should be directed to \fIsome other\fP tty line(s) that \fBdo\fP use the
ordinary serial port hardware.
.VA dirs Tmp
A \fBlist\fP of directories that will be used to create subdirectories
and temporary files to run the
user job streams from.
.VA rate 2
Users are assumed to type at a rate of
.SV rate
characters per second.
.TN work
Of all the tests in MUSBUS,
this is the by far the most complicated, most realistic and most likely
to fail.
This test is performed (\c
.SV iterations /2)
times.
.LP
The synthetic workload is created from a number of job streams, each
of which is described by a line in the file
.FL Tmp/workload .
Each line consists of
.IP \(bu
the home directory for the job stream,
.IP \(bu
the full pathname of the program to run,
.IP \(bu
optional arguments to that program,
.IP \(bu
an optional source
of standard input to that program (a filename prefixed by ``<''), and
.IP \(bu
an optional destination
for standard output from that program (a filename prefixed by ``>'').
.LP
.FL Tmp/workload
is created automatically by the command script
.PR run
based upon
.IP (a)
the variables
.SV dirs
and
.SV ttys , and
.IP (b)
the workload profile
.FL Workload/script.master
from which the script interpreter program name is extracted
and the individual input script files (\c
.FL Tmp/script.? ).
.LP
When
.FL Tmp/workload
is constructed, a cyclic scheme
is used to share user work amongst the
available directories and tty lines (as per
.SV dirs
and
.SV ttys ).
In this way, serial i/o bottlenecks for large numbers of simulated
users, and unbalanced disk i/o across spindles may be avoided.
As a dynamic check, the program
.PR ttychk
is used within
.PR run
to check for potential bandwidth limitations on the serial i/o
lines, given the number of lines and the maximum number of job streams.
.LP
The workload profile (\c
.FL Workload/script.master )
has the following format.
.IP 1.
The first line must begin ``%W%'' followed by the full pathname of the
relevant interpreter and any required options.
For example, if the script should be run by the Bourne shell, an
appropriate specification would be
.RS
.HS
%W% /bin/sh -ie
.HE
.RE
.IP 2.
All subsequent lines up to the first line beginning with ``%%''
are preamble commands that must appear at the \f3beginning\fP of \f3every\fP
script.
.IP 3.
Sequences of commands terminated by a line beginning with ``%%''
constitute a job step.
Each job step is an autonomous piece of work such that once the
preamble has been executed, job steps may be executed in \f3any\fP order.
.IP 4.
Any lines following the last ``%%'' line form a postscript
that must appear at the \f3end\fP of \f3every\fP
script.
.LP
The command procedure
.PR mkscript
and the program
.PR mkperm
are used (by
.PR run )
to create several (usually 4) scripts from
.FL Workload/script/master
with random permutations
of the job steps.
These scripts reside in
.FL Tmp/script.?
and are assigned in a cyclic manner to create the job streams.
The work for \fBeach\fP
simulated user is generated from
\fBone\fP job stream.
.LP
For example the distributed 
.FL Workload/script.master
is
.LP
.RS
.nf
.if t .HS
.CK Workload/script.master
%W% /bin/sh -ie
mkdir /tmp/$$ tmp
%% 1 edit
\&./keyb edscr1.dat | ed edit.dat
: .......................................................
: .    This is some filler of about the same            .
: .    size as the file edscr1.dat, since the           .
: .    emulated input proceeds in parallel, and         .
: .    we want the real-time delay to be about right    .
: .......................................................
chmod u+w temporary
rm temporary
%% 2 ls
ls -l
%% 3 cat
cat cat.dat
%% 4 compile
cc -c cctest.c 1>&2
rm *.o
%% 5 edit, compile and link
chmod 444 dummy.c
\&./keyb edscr2.dat | ed dummy.c
: .  more textual and time filler for the second edscript file, edscr2.dat .
cc dummy.c 1>&2
rm a.* grunt.c
%% 6 grep
grep '[ 	]*nwork' grep.dat
%% 7 file copying
cp *.c edit.dat /tmp/$$
cp /tmp/$$/* tmp
%%
rm -rf tmp /tmp/$$
.if t .HE
.fi
.RE
.LP
This generates several job streams one of which (\c
.FL Tmp/script.1 )
contains,
.LP
.RS
.nf
.if t .HS
.CK Tmp/script.1
mkdir /tmp/$$ tmp
cc -c cctest.c 1>&2
rm *.o
\&./keyb edscr1.dat | ed edit.dat
: .......................................................
: .    This is some filler of about the same            .
: .    size as the file edscr1.dat, since the           .
: .    emulated input proceeds in parallel, and         .
: .    we want the real-time delay to be about right    .
: .......................................................
chmod u+w temporary
rm temporary
cat cat.dat
grep '[ 	]*nwork' grep.dat
chmod 444 dummy.c
\&./keyb edscr2.dat | ed dummy.c
: .  more textual and time filler for the second edscript file, edscr2.dat .
cc dummy.c 1>&2
rm a.* grunt.c
cp *.c edit.dat /tmp/$$
cp /tmp/$$/* tmp
ls -l
rm -rf tmp /tmp/$$
.if t .HE
.fi
.RE
.LP
Given the following environment variable assignments,
.RS
.nf
nusers=8
ttys=/dev/ttyh0 /dev/ttyh8 /dev/ttyha
dirs=Tmp /usr/tmp
.fi
.RE
the created workload description file (\c
.FL Tmp/workload )
contains
.LP
.RS
.nf
.if t .HS
Tmp/user1 /bin/sh -ie <Tmp/script.1 >/dev/ttyh0
/usr/tmp/user2 /bin/sh -ie <Tmp/script.2 >/dev/ttyh8
Tmp/user3 /bin/sh -ie <Tmp/script.3 >/dev/ttyha
/usr/tmp/user4 /bin/sh -ie <Tmp/script.4 >/dev/ttyh0
Tmp/user5 /bin/sh -ie <Tmp/script.1 >/dev/ttyh8
/usr/tmp/user6 /bin/sh -ie <Tmp/script.2 >/dev/ttyha
Tmp/user7 /bin/sh -ie <Tmp/script.3 >/dev/ttyh0
/usr/tmp/user8 /bin/sh -ie <Tmp/script.4 >/dev/ttyh8
.if t .HE
.fi
.RE
.LP
It is strongly recommended that you create your own workload
profile for the multi-user test to reflect the anticipated
system usage.
To do this,
.IP 1.
Use the distributed files in the
.FL Workload
directory as a guide.
.IP 2.
Create a new
.FL Workload/script.master
describing the required job steps.
.IP 3.
Ensure all required data files are in the
.FL Workload
directory, because every job stream executes with the current
directory containing its own private copies of \f3all\fP the
files from
.FL Workload .
.IP 4.
Ensure the makefile (\c
.FL Workload/Makefile )
has the following targets defined (they are assumed to exist by
.PR run ).
.RS
.nr II 6n
.IP (a)
context : ensure all files needed to run a script are present.
.IP (b)
clean : remove any unnecessary temporary files, e.g. those created from
somewhere else during a ``make context''.
.IP (c)
script.out : run a script and trap all the output; the file
.FL script.out
should contain the concatenation of the script input and the
script output.
This file is used by
.PR ttychk
to compute the output tty bandwidth requirements.
.RE
.nr II 0
.LP
The program
.PR makework
reads the
.FL Tmp/workload
file and builds data structures
for each job stream (i.e. each simulated user) describing the home
directory, command interpreter and its options and standard input and
standard output assignments.
Thereafter
.PR makework
starts the user program(s) (\c
.PR /bin/sh
above) and pumps
random chunks of input to them down pipes
so that the aggregate rate across all simulated users does not exceed
.SV rate
\(mu
.SV nusers
characters per second.
.LP
Because of process creation limits, this test
\fBmust be run as root\fP.
.LP
Because of open file limits,
.PR makework
will create \fIclones\fP of itself if
there are too many users for it to simulate alone.
.LP
If the standard input files to the job streams invoke interactive
programs (e.g. \c
.PR ed ),
then substantial care must be taken that the data
pumped down the pipe by
.PR makework
ends up at the correct destination.
This has been the cause of some catastrophic problems in which only
parts of the job streams have been run by 
.PR /bin/sh
and the rest has
been sucked up and thrown away by 
.PR /bin/ed .
.LP
To try and avoid these problems, the program
.PR keyb
has been
created to emulate one user typing at a terminal.
.PR keyb
(like
.PR makework )
uses the environment variables
.SV rate
and
.SV tty
(as set up by
.PR run )
to know how fast to generate output and where the input
should be echoed to.
.LP
Once
.PR run
work has finished, it is \fBessential\fP
that the following checks be performed.
.IP (1)
Look in the file
.FL Results/log .
Check for wild variances in the execution times (a sign that not all job streams
are being run to completion), and any obscure error messages that would
have been generated on stderr from
.PR makework .
Make sure that the execution times look reasonable.
One easy check is that the CPU time for ``n'' users
\f2must be at least\fP
\&``n'' times the CPU time for one simulated user, since the CPU times
should be nearly the same for all users in a given run (although
the CPU time per user is expected to rise as the number of concurrent
users increases).
.IP (2)
Check the file
.FL Results/log.work
that contains echoed comments, status information
and shell error output from the simulated user work.
.nr II 4n
.LP
The lines preceded by a line of the form ``Tmp/userlog.nnn:'' are copies of
the shell error output for simulated user ``nnn''.
This should consist of a row of ``# ''s (assuming root's 
.PR /bin/sh
prompt is ``# '').
.LP
The lines preceded by a line of the form ``Tmp/masterlog.nnn:'' are
the standard error output from master number ``nnn''.
Master 0 is the real master
.PR makework
process, the others are clones.
Check that there are no messages of the forms,
.RS
.nf
.if t .HS
makework: cannot open %s for std output
makework: chdir to "%s" failed!
user %d job %d pid %d done exit code %d
user %d job %d pid %d done status 0x%x
user %d job %d pid %d done exit code %d status 0x%x
clone %d done, pid %d exit code %d
clone %d done, pid %d status 0x%x
clone %d done, pid %d exit code %d status 0x%x
user %d job %d pid %d killed off
\&... reason ... pid %d killed off
.if t .HE
.fi
.RE
\fBAny\fP these messages indicate something has gone terribly wrong.
.LP
On the other hand, messages of the form
.RS
.nf
.if t .HS
master pid %d
clone %d pid %d
user %d job %d pid %d pipe fd %d
user %d job %d pid %d done
clone %d done, pid %d
.if t .HE
.fi
.RE
are just warm reassurance that everything is going well.
.nr II 0
.NH 3.3 Miscellaneous
.TN x
Like ``run work'', except the initial filesystem status reporting,
tty bandwidth check
and clock checks are omitted.
Useful when using the multi-user test for diagnostic purposes, and the
initial housekeeping is not needed.
.NH 4 "The Complete Test"
.LP
When everything is apparently installed and operating correctly, 
login as root, choose another inactive terminal running at 9600 baud
(/dev/ttyx below) and start the whole charade as follows.
.SC tty=/dev/ttyx
.SC "export tty"
.SC "rm Report/log Report/log.work"
.SC "./run &"
.LP
On a 4 Mbyte VAX 11/780, simulating 1, 4, 8, 16, 24 and 32
users in the multi-user test, this takes about 5 hours to run!
.NH 5 "What Does It All Mean?"
.LP
Finally one should be in a position to contemplate the summaries in
the
.FL Results/log
file.
Look in
.FL Tools
for the scripts
.PR mktbl
and
.PR mkcomp
to create
.PR tbl
input directly from the log files
for a single system or to compare two systems.
This is most useful in comparison to the same tests run on another
system, or on another version of the same system.
.LP
Lots of things can influence the results, and people interpreting
the results should be aware of the following (probably incomplete)
list.
.IP (1)
Available real memory for the disk block cache and user processes.
.IP (2)
The physical disk hardware; number and type of spindles, controller
type and paths to devices.
.IP (3)
The logical disk arrangement; allocation of critical directories such
as 
.PR /tmp , 
.PR /usr
and user filesystem across physical devices, the number
and distribution of swap partitions.
.IP (4)
Standard of C compiler and optimizer; everything tested is written
in C, any improvements here will help everything (even the kernel!).
.IP (5)
Physical block size for swapping and paging; some of the test programs
are very small and so may incur large physical i/o costs.
.IP (6)
Flavour of UNIX you are using.
.IP (7)
The accuracy of real time measurements and flow control in
.PR makework .
Check the output from
.PR clock
in
.FL Results/log
at the start of the multi-user
test to determine the extent to which a controlling SIGALRM loop
measures wallclock time -- this can influence elapsed time in the
multi-user test particularly.
.NH 6 "Caveat Emptor"
.LP
The MUSBUS tests have been widely distributed, and in some cases
their owners have not treated them kindly.
The following list details known pitfalls in running the tests
and the subsequent application of the results.
.IP (1)
There are several versions of the test suite, Version 3.3 (and later
versions) in particular
are very different to earlier versions and results obtained
with different versions of the suite cannot be meaningfully
compared.
I make no claim for the long-term
stability of MUSBUS, and so this evolutionary
process is likely to continue with future releases of the test suite.
.IP (2)
There is \fBno\fP reason to suspect that the distributed workload
for the multi-user test (i.e. the files
.FL Tmp/script.?
and
.FL Tmp/workload )
are representative of the user work profile at \fByour\fP installation(s).
Be prepared to alter or rebuild the workload to reflect your
expected system usage.
.IP (3)
Results have been known to vary
dramatically between releases of the UNIX you
are testing.
This reflects vendor tuning (sometimes breaking) of the UNIX port
and MUSBUS is a useful diagnostic tool in this area, provided the
MUSBUS version and workload profile remain fixed.
.IP (4)
Points (1) to (3) suggest that uncontrolled and uninformed
comparisons of
MUSBUS results is dangerous in the extreme.
This is the main reason that I have not published the large
collection of results accumulated to date.
.IP (5)
Remember that the tests described in Section 3.1 are intended
for \fBdiagnostic\fP use.
If you are interested in \fBperformance\fP, you should focus
upon the multi-user test described in Section 3.2.
.IP (6)
Beware of simulating \fBtoo few\fP users in the multi-user
test.
Useful information about system throughput and performance
under heavy load conditions can usually be obtained by
extrapolation of various measures computed from the
CPU and elapsed times for the multi-user tests with
various numbers of users.
However this assumes the machine has been sufficiently loaded to
move out of the \fIlinear\fP part of the performance curves.
For very fast machines, this may require emulation of
a \fIlarge\fP number of users in the multi-user test.
.IP (7)
Beware of simulating \fBtoo many\fP users in the multi-user
test.
Using the default value for
.SV ttys ,
\fBall\fP simulated
tty output is directed to a \fIsingle\fP serial port.
As you increase the number of simulated users in the multi-user
test (in response to Point (6) above) the serial port
bandwidth may become the limiting resource!
This is easy to fix by adding a list of more tty devices to the 
value of
.SV ttys .
.IP (8)
Be warned that the multi-user test has ``broken'' several UNIX
ports.
Causes have been identified as implementation (configuration)
limits in the system being tested (e.g. proc slots), real
bugs in the port or MUSBUS errors.
This list is basically in order of decreasing probability.
.LP
Communication on MUSBUS experiences is welcomed at the
electronic addresses on the first page of this document.
If you have found a problem, or can suggest a better testing
technique please let me know, so that future versions might
offer real (as opposed to cosmetic) enhancements.
DataMuseum.dk

DKUUG/EUUG Conference tapes

⟦c146917a3⟧ TextFile

Derivation

TextFile