|
|
DataMuseum.dkPresents historical artifacts from the history of: DKUUG/EUUG Conference tapes |
This is an automatic "excavation" of a thematic subset of
See our Wiki for more about DKUUG/EUUG Conference tapes Excavated with: AutoArchaeologist - Free & Open Source Software. |
top - metrics - downloadIndex: T e
Length: 14813 (0x39dd)
Types: TextFile
Names: »example.out«
└─⟦a0efdde77⟧ Bits:30001252 EUUGD11 Tape, 1987 Spring Conference Helsinki
└─⟦this⟧ »EUUGD11/stat-5.3/eu/stat/example/example.out«
$ ff -dc -w 79 example.txt
Annotated Example from Chapter 2 of the |STAT Handbook
Copyright 1986 Gary Perlman
A concrete example with several |STAT programs is worked in detail.
The example shows the style of analysis in |STAT. New Users of
|STAT should not try to understand all the details in the examples.
Details about all the programs can be found in the online manual entries
and more examples of program use appear in other chapters of the Handbook.
The example is based on a familiar problem: grades in a course based on
two midterm exams and a final exam. Scores on exams are broken down by
student gender and by the lab section taught by one of two teaching
assistants: John or Jane. The data are in the file exam.dat. Each line
in exam.dat contains a student ID number, the student's teaching assistant,
the student's gender, and scores (out of 100) on the midterms and final.
We will compute final grades based on the exam scores, compare male and
female students, and compare the two teaching assistants. The annotations
in Chapter 2 of the Handbook will provide more details.
-------------------- Section 2.1 Data in exam.dat
$ cat exam.dat
S-1 john male 56 42 58
S-2 john male 96 90 91
S-3 john male 70 59 65
S-4 john male 82 75 78
S-5 john male 85 90 92
S-6 john male 69 60 65
S-7 john female 82 78 60
S-8 john female 84 81 82
S-9 john female 89 80 68
S-10 john female 90 93 91
S-11 jane male 42 46 65
S-12 jane male 28 15 34
S-13 jane male 49 68 75
S-14 jane male 36 30 48
S-15 jane male 58 58 62
S-16 jane male 72 70 84
S-17 jane female 65 61 70
S-18 jane female 68 75 71
S-19 jane female 62 50 55
S-20 jane female 71 72 87
\f
-------------------- Section 2.2 Computing Final Scores
$ dm INPUT ".2*x4 + .3*x5 + .5*x6" < exam.dat > scores.dat
-------------------- Examine Scores File: scores.dat
$ cat scores.dat
S-1 john male 56 42 58 52.8
S-2 john male 96 90 91 91.7
S-3 john male 70 59 65 64.2
S-4 john male 82 75 78 77.9
S-5 john male 85 90 92 90
S-6 john male 69 60 65 64.3
S-7 john female 82 78 60 69.8
S-8 john female 84 81 82 82.1
S-9 john female 89 80 68 75.8
S-10 john female 90 93 91 91.4
S-11 jane male 42 46 65 54.7
S-12 jane male 28 15 34 27.1
S-13 jane male 49 68 75 67.7
S-14 jane male 36 30 48 40.2
S-15 jane male 58 58 62 60
S-16 jane male 72 70 84 77.4
S-17 jane female 65 61 70 66.3
S-18 jane female 68 75 71 71.6
S-19 jane female 62 50 55 54.9
S-20 jane female 71 72 87 79.3
\f
-------------------- Sort Records by Final Scores
$ reverse -f < scores.dat | sort
27.1 34 15 28 male jane S-12
40.2 48 30 36 male jane S-14
52.8 58 42 56 male john S-1
54.7 65 46 42 male jane S-11
54.9 55 50 62 female jane S-19
60 62 58 58 male jane S-15
64.2 65 59 70 male john S-3
64.3 65 60 69 male john S-6
66.3 70 61 65 female jane S-17
67.7 75 68 49 male jane S-13
69.8 60 78 82 female john S-7
71.6 71 75 68 female jane S-18
75.8 68 80 89 female john S-9
77.4 84 70 72 male jane S-16
77.9 78 75 82 male john S-4
79.3 87 72 71 female jane S-20
82.1 82 81 84 female john S-8
90 92 90 85 male john S-5
91.4 91 93 90 female john S-10
91.7 91 90 96 male john S-2
-------------------- Another Way Using dsort
$ dsort n7 < scores.dat
S-12 jane male 28 15 34 27.1
S-14 jane male 36 30 48 40.2
S-1 john male 56 42 58 52.8
S-11 jane male 42 46 65 54.7
S-19 jane female 62 50 55 54.9
S-15 jane male 58 58 62 60
S-3 john male 70 59 65 64.2
S-6 john male 69 60 65 64.3
S-17 jane female 65 61 70 66.3
S-13 jane male 49 68 75 67.7
S-7 john female 82 78 60 69.8
S-18 jane female 68 75 71 71.6
S-9 john female 89 80 68 75.8
S-16 jane male 72 70 84 77.4
S-4 john male 82 75 78 77.9
S-20 jane female 71 72 87 79.3
S-8 john female 84 81 82 82.1
S-5 john male 85 90 92 90
S-10 john female 90 93 91 91.4
S-2 john male 96 90 91 91.7
\f
-------------------- Section 2.3 Summary of Final Scores
$ dm s7 < scores.dat | desc -o -t 75 -h -i 10 -m 0
------------------------------------------------------------
Under Range In Range Over Range Missing Sum
0 20 0 0 1359.200
------------------------------------------------------------
Mean Median Midpoint Geometric Harmonic
67.960 68.750 59.400 65.564 62.529
------------------------------------------------------------
SD Quart Dev Range SE mean
16.707 10.575 64.600 3.736
------------------------------------------------------------
Minimum Quartile 1 Quartile 2 Quartile 3 Maximum
27.100 57.450 68.750 78.600 91.700
------------------------------------------------------------
Skew SD Skew Kurtosis SD Kurt
-0.586 0.548 2.844 1.095
------------------------------------------------------------
Null Mean t prob (t) F prob (F)
75.000 -1.884 0.075 3.551 0.075
------------------------------------------------------------
Midpt Freq
5.000 0
15.000 0
25.000 1 *
35.000 0
45.000 1 *
55.000 4 ****
65.000 5 *****
75.000 5 *****
85.000 2 **
95.000 2 **
\f
-------------------- Section 2.4 Predicting Final Exam Scores
$ dm x6 x4 x5 < scores.dat | regress -e final midterm1 midterm2
Analysis for 20 cases of 3 variables:
Variable final midterm1 midterm2
Min 34.0000 28.0000 15.0000
Max 92.0000 96.0000 93.0000
Sum 1401.0000 1354.0000 1293.0000
Mean 70.0500 67.7000 64.6500
SD 15.3502 18.6720 20.4303
Correlation Matrix:
final 1.0000
midterm1 0.7586 1.0000
midterm2 0.8838 0.9190 1.0000
Variable final midterm1 midterm2
Regression Equation for final:
final = -0.2835 midterm1 + 0.9022 midterm2 + 30.9177
Significance test for prediction of final
Mult-R R-Squared SEest F(2,17) prob (F)
0.8942 0.7996 7.2640 33.9228 0.0000
\f
-------------------- Predicted Plot From Regression Equation in regress.eqn
$ dm x6 x4 x5 < scores.dat | dm Eregress.eqn |
pair -p -h 10 -w 30 -x final -y predicted
|------------------------------|89.3045
| 3|
| 1 1 |
| 1 1 11 1 1 |
| |
| 1 2 1 |predicted
| 1 1 |
| 1 |
| 1 |
| |
|1 |
|------------------------------|36.5121
34.000 92.000
final r= 0.894
-------------------- Residual Plot
$ dm x6 x4 x5 < scores.dat | dm Eregress.eqn | dm x2 x1-x2 |
pair -p -h 10 -w 30 -x predicted -y residuals
|------------------------------|11.2546
| 11 |
| 1 |
| 1 1 1 1 1|
| 1 1 1 1|
|1 1 1 |residuals
| 1 1 |
| 1 |
| 1 |
| |
| 1 |
|------------------------------|-18.0399
36.512 89.304
predicted r= 0.000
\f
-------------------- Section 2.5 Failures by Assistant and Gender
$ dm s2 s3 "if x7 GE 75 then 'pass' else 'fail'" 1 < scores.dat |
contab assistant gender success count
FACTOR: assistant gender success count
LEVELS: 2 2 2 20
assistan count
john 10
jane 10
Total 20
NOTE: Yates' correction for continuity applied
chisq 0.000000 df 1 p 1.000000
gender count
male 12
female 8
Total 20
NOTE: Yates' correction for continuity applied
chisq 0.450000 df 1 p 0.502335
success count
fail 12
pass 8
Total 20
NOTE: Yates' correction for continuity applied
chisq 0.450000 df 1 p 0.502335
SOURCE: assistant gender
male female Totals
john 6 4 10
jane 6 4 10
Totals 12 8 20
Analysis for assistant x gender:
NOTE: Yates' correction for continuity applied
WARNING: 2 of 4 cells had expected frequencies < 5
chisq 0.000000 df 1 p 1.000000
Fisher Exact One-Tailed Probability 0.675042
Fisher Exact Other-Tail Probability 0.675042
Fisher Exact Two-Tailed Probability 1.000000
phi Coefficient == Cramer's V 0.000000
Contingency Coefficient 0.000000
SOURCE: assistant success
fail pass Totals
john 4 6 10
jane 8 2 10
Totals 12 8 20
Analysis for assistant x success:
NOTE: Yates' correction for continuity applied
WARNING: 2 of 4 cells had expected frequencies < 5
chisq 1.875000 df 1 p 0.170904
Fisher Exact One-Tailed Probability 0.084901
Fisher Exact Other-Tail Probability 0.084901
Fisher Exact Two-Tailed Probability 0.169802
phi Coefficient == Cramer's V 0.306186
Contingency Coefficient 0.292770
SOURCE: gender success
fail pass Totals
male 8 4 12
female 4 4 8
Totals 12 8 20
Analysis for gender x success:
NOTE: Yates' correction for continuity applied
WARNING: 3 of 4 cells had expected frequencies < 5
chisq 0.078125 df 1 p 0.779855
Fisher Exact One-Tailed Probability 0.886759
Fisher Exact Other-Tail Probability 0.259609
Fisher Exact Two-Tailed Probability 1.000000
phi Coefficient == Cramer's V 0.062500
Contingency Coefficient 0.062378
SOURCE: assistant gender success
assistan gender success
john male fail 3
john male pass 3
john female fail 1
john female pass 3
jane male fail 5
jane male pass 1
jane female fail 3
jane female pass 1
\f
-------------------- Section 2.6 Effects of Assistant and Gender
$ dm s1 s2 s3 "'m1'" s4 s1 s2 s3 "'m2'" s5 s1 s2 s3 "'final'" s6 < scores.dat |
maketrix 5 | anova student assistant gender exam score
SOURCE: grand mean
assista gender exam N MEAN SD SE
60 67.4667 18.0981 2.3365
SOURCE: assistant
assista gender exam N MEAN SD SE
john 30 76.7000 13.7869 2.5171
jane 30 58.2333 17.3179 3.1618
SOURCE: gender
assista gender exam N MEAN SD SE
male 36 62.8611 20.1085 3.3514
female 24 74.3750 11.9120 2.4315
SOURCE: assistant gender
assista gender exam N MEAN SD SE
john male 18 73.5000 15.4053 3.6311
john female 12 81.5000 9.6153 2.7757
jane male 18 52.2222 18.8541 4.4440
jane female 12 67.2500 9.6684 2.7910
SOURCE: exam
assista gender exam N MEAN SD SE
m1 20 67.7000 18.6720 4.1752
m2 20 64.6500 20.4303 4.5684
final 20 70.0500 15.3502 3.4324
SOURCE: assistant exam
assista gender exam N MEAN SD SE
john m1 10 80.3000 11.9355 3.7743
john m2 10 74.8000 16.3761 5.1786
john final 10 75.0000 13.4247 4.2453
jane m1 10 55.1000 15.5167 4.9068
jane m2 10 54.5000 19.5973 6.1972
jane final 10 65.1000 16.2101 5.1261
SOURCE: gender exam
assista gender exam N MEAN SD SE
male m1 12 61.9167 20.7822 5.9993
male m2 12 58.5833 22.5931 6.5221
male final 12 68.0833 17.1329 4.9459
female m1 8 76.3750 11.1475 3.9413
female m2 8 73.7500 13.1557 4.6512
female final 8 73.0000 12.7167 4.4960
SOURCE: assistant gender exam
assista gender exam N MEAN SD SE
john male m1 6 76.3333 14.1516 5.7774
john male m2 6 69.3333 19.1172 7.8046
john male final 6 74.8333 14.4418 5.8959
john female m1 4 86.2500 3.8622 1.9311
john female m2 4 83.0000 6.7823 3.3912
john female final 4 75.2500 13.8894 6.9447
jane male m1 6 47.5000 15.8461 6.4692
jane male m2 6 47.8333 21.9127 8.9458
jane male final 6 61.3333 18.1071 7.3922
jane female m1 4 66.5000 3.8730 1.9365
jane female m2 4 64.5000 11.3871 5.6936
jane female final 4 70.7500 13.0735 6.5368
FACTOR : student assistant gender exam score
LEVELS : 20 2 2 3 60
TYPE : RANDOM BETWEEN BETWEEN WITHIN DATA
SOURCE SS df MS F p
===============================================================
mean 273105.0667 1 273105.0667 443.734 0.000 ***
s/ag 9847.5278 16 615.4705
assista 5115.2667 1 5115.2667 8.311 0.011 *
s/ag 9847.5278 16 615.4705
gender 1909.0028 1 1909.0028 3.102 0.097
s/ag 9847.5278 16 615.4705
ag 177.8028 1 177.8028 0.289 0.598
s/ag 9847.5278 16 615.4705
exam 293.2333 2 146.6167 4.564 0.018 *
es/ag 1027.8889 32 32.1215
ae 610.4333 2 305.2167 9.502 0.001 ***
es/ag 1027.8889 32 32.1215
ge 314.5722 2 157.2861 4.897 0.014 *
es/ag 1027.8889 32 32.1215
age 29.2056 2 14.6028 0.455 0.639
es/ag 1027.8889 32 32.1215
-------------------- Scheffe 95% Confidence Interval:
$ echo "sqrt ($df1 * $critf * $MSerror * 2 / $N)" | calc
sqrt(((((2 * 3.294537) * 32.1215) * 2) / 10)) = 6.506165391