DataMuseum.dk

Presents historical artifacts from the history of:

DKUUG/EUUG Conference tapes

This is an automatic "excavation" of a thematic subset of
artifacts from Datamuseum.dk's BitArchive.

See our Wiki for more about DKUUG/EUUG Conference tapes

Excavated with: AutoArchaeologist - Free & Open Source Software.


top - download
Index: ┃ T t

⟦6e1af44cb⟧ TextFile

    Length: 7261 (0x1c5d)
    Types: TextFile
    Names: »tutorial«

Derivation

└─⟦87ddcff64⟧ Bits:30001253 CPHDIST85 Tape, 1985 Autumn Conference Copenhagen
    └─ ⟦this⟧ »cph85dist/stat/doc/tutorial« 

TextFile

.ls 1
.de EX
.ce
.ft B
\\$1
.ft
..
.LH "Plotting a Function
.P
Suppose you want to make a plot of the function
.EX "Y = X**2 - 30x + 10
First, use SERIES to create a set of numbers to work with.
.EX "series -100 100
Then, transform this data using DM.
.EX "series -100 100 | dm x1 "x1*x1-30*x1+10"
Then, plot this data using the ``p'' option of PAIR.
.EX "series -100 100 | dm x1 "x1*x1-30*x1+10" | pair -p
The result is show below.
.nf
|--------------------------------------------------|13010
|3                                                 |
|21                                                |
| 3                                                |
|  4                                               |
|   3                                              |
|   12                                             |
|    22                                            |
|     21                                           |
|      31                                          |
|       31                                        4|
|        31                                      4 |
|         32                                   14  |
|          22                                 13   |
|           24                               33    |
|             41                            41     |
|              33                         24       |
|               142                     142        |
|                 242                 143          |
|                   2441            443            |
|                      3444444444444               |
|--------------------------------------------------|-215
-100.000                                     100.000
.fi
.bp
.LH "Correlations
.P
Suppose you want to see the correlations between
X, the square of X, its logarithm, and its square root.
First, you create a series of numbers to work with.
.EX "series 1 100
Then you use DM to create your transformed columns.
.EX "series 1 100 | dm x1 "x1*x1" "log(x1)" "x1^.5"
Then you can pipe this output to CORR to get correlations.
.EX "series 1 100 | dm x1 "x1*x1" "log(x1)" "x1^.5" |
.EX " | corr x "x*x" "log(x)" "sqrt(x)"
The result is shown below.
.nf
Analysis for 100 points of 4 variables:
VARIABLE  :          x        x*x     log(x)    sqrt(x) 
MIN       :     1.0000     1.0000     0.0000     1.0000 
MAX       :   100.0000 10000.0000     4.6052    10.0000 
MEAN      :    50.5000  3383.5000     3.6374     6.7146 
SD        :    29.0115  3024.3558     0.9281     2.3385 
CORRELATION MATRIX:
x         :     1.0000
x*x       :     0.9689     1.0000
log(x)    :     0.8959     0.7786     1.0000
sqrt(x)   :     0.9815     0.9076     0.9621     1.0000
VARIABLE  :          x        x*x     log(x)    sqrt(x) 
.fi
.bp
.LH "Analysis of Variance
.P
Suppose you want to use ANOVA on some multifactor data.
First, you may have to set up a file of labels for the variables.
Suppose you have three variables:
subject name (12 in all),
dosage (low, medium, and high),
and hours without sleep (0, 10, 20, 30, 40).
You have 12*3*5 or 180 data points for each of your measures.
Suppose your N measures are in a file of N columns,
and that each subject's data is reported in successive lines.
Within each subject,
low dosages are reported for each fatigue level,
then high, then medium.
You would set up files called fatigue and dosage
with the lines:
.nf
.ta 1i 2i 3i 4i 5i
	\fIfatigue\fR	\fIdosage\fR
	10	low
	20	low
	30	low
	40	low
	50	low
		high
		high
		high
		high
		high
		medium
		medium
		medium
		medium
		medium
.fi
.bp
Then you would create a label file using DM and ABUT.
.EX "series 0 179 | dm "floor(x1/15)" | abut -c - fatigue dosage
The result for the first subject is shown below.
Suppose this label file is called ``label.''
.nf
1	10	low	
1	20	low	
1	30	low	
1	40	low	
1	50	low	
1	10	high	
1	20	high	
1	30	high	
1	40	high	
1	50	high	
1	10	medium	
1	20	medium	
1	30	medium	
1	40	medium	
1	50	medium	
.fi
.bp
.P
Now you can use ANOVA on the different measures.
Suppose you want to work on the third measure.
.EX "dm s3 < data | abut label - | anova S fatigue dose Var3
You could do all the variables using a shell script loop.
.nf
n=7
i=1
while (`eval $i<=$n`) do
   dm s$i < data | abut label - | anova S fatigue dose Var$i > $i.out
done
pr [1-$n].out
.fi
.LH "Transformations and Paired T-Tests
.P
Suppose you wanted to compare the numbers in the first
half of a file with those in the second half.
The PAIR program assumes X and Y numbers alternate,
so some reformatting is needed.
Assuming not too many numbers are involved,
the MAKETRIX and TRANSPOSE commands can be used.
Suppose you have 50 numbers
on the first two lines of the file,
.nf
1 2 3 4 5 6 7 8 9 8 7 6 5 4 3 2 1 2 3 4 5 6 7 8 7
3 2 4 3 5 4 6 5 7 6 8 7 9 8 7 6 5 6 7 6 5 4 3 2 2
.fi
and 50 on the second two.
For this example, the second two will be generated using
DM and MAKETRIX.
.EX "maketrix 1 | dm "floor(10*log(x1))" | maketrix 25
.nf
0 6 10 13 16 17 19 20 21 20 19 17 16 13 10 6 0 6 10 13 16 17 19 20 19
10 6 13 10 16 13 17 16 19 17 20 19 21 20 19 17 16 17 19 17 16 13 10 6 6
.fi
The following will make a matrix with 50 columns,
and transpose it to make a two column input to pair
.EX "maketrix 50 | transpose | pair -ps
.bp
.nf
                         Column 1         Column 2       Difference
Minimums                   1.0000           0.0000         -12.0000
Maximums                   9.0000          21.0000           1.0000
Sums                     253.0000         716.0000        -463.0000
SumSquares              1513.0000       11704.0000        4853.0000
Means                      5.0600          14.3200          -9.2600
SDs                        2.1798           5.4415           3.3975
t(49)                     16.4143          18.6085         -19.2722
p                         -0.0000          -0.0000          -0.0000

     Correlation        r-squared            t(48)                p
          0.9619           0.9252          24.3656           0.0000
       Intercept            Slope
          2.1701           2.4012
|--------------------------------------------------|21
|                                           5     2|
|                                     8            |
|                                                  |
|                               8                  |
|                        7                         |
|                                                  |
|                                                  |
|                  6                               |
|                                                  |
|                                                  |
|            6                                     |
|                                                  |
|                                                  |
|                                                  |
|      6                                           |
|                                                  |
|                                                  |
|                                                  |
|                                                  |
|2                                                 |
|--------------------------------------------------|0
1.000                                          9.000
.fi