BIOS601 AGENDA: Thursday September 20, 2018
[updated Sept 18, 2018]
Agenda for Thursday September 20, 2018
- Discussion of issues
in JH's
Notes and assignment on mean/quartile of a quantitative variable
Answers to be submitted by Friday 21st:
Exercise 0.14 parts i, iii, and iv (omit part ii),
Exercise 0.17
Exercise 0.18
See notes below.
But, first, a few remarks on the Notes themselves:
Section 4.
Student's problem was not about the n at which the CLT kicks in
[he was already assuming the component r.v.s' are N(,)]
but about when the sample standard deviation (s) is a good substitute (proxy) for
the 'true' but unknown 'population' standard deviation (sigma):
"no one has yet told
us very clearly where the limit between
'large' and 'small' samples is to be drawn."
In connection with the 100th anniversary of Student's
ground-breaking 1908 paper, JH and colleagues went back to
that paper, and to the way he mathematically derived
what is now called the 't' distribution.
Full details, including why Gosset called himself 'Student',
and his simulations to check out his shaky algebra,
can be found at Article/Material in connection with 100th
Anniversary of Student's 1908 paper.
The 2 persons in the photo
with JH at the reception at the Guinness Brewery in Dublin in 2008
are the grandson and granddaughter of William Gosset ('Student').
At the unveiling of the plaque, the grandson told us that he
was pretty sure he alone, of those assembled in 2008, had personally
met Gosset. He was 6 months old when he was brought in to
the hospital to see his grandfather, a few months before the grandfather died
in 1937 in London (Gosset was English-born and educated, but worked
for Guinness in the Dublin HQ from 1899 onwards. In 1935,
he moved to London to take charge of the scientific side of production,
at a new Guinness brewery at Park Royal in North West London,
but died just two years later, at the age of 61.
Short bio.
Interestingly, that 1908 paper was of limited use, since it dealt only
with 1-sample problems. It took Fisher's insights in the 1920s
to generalize it to not just 2-sample problems, but also
correlation and regression, indeed to any context where one was dealing with
a ratio of a mean or correlation or slope to its standard error;
in turn, the SE involved the sqrt of an independent
plug-in estimate of the unit variance. Fisher called the no.
of independent contributions to that estimate the "degrees of freedom".
In this context, JH usually defines the "d.f." as "the
number of independent estimates of error": think of the
number of independent residuals (which one "pools" to
get one overall estimate of sigma-squared) as a case in point. It
is no different in spirit from pooling the squared within-group
deviations from their own means
[they are also residuals, from each fitted (ie group) mean].
In "Another worked Example, with graphic", JH is trying to get
statisticians and their collaborators to use a better way to display
paired data: the usual presentations involve separate SE's for the two
means, as though the one mean was from one sample of n, and the other
from and entirely separate (independent) sample of n.
Section 4.3/4.4
Other years, we left sample size and precision issues
until later in the course, where we planned deal with
them 'en masse.' But many years we never had the time
at the end of the course. So this year, following on from
the calculations you did
with the step-counter data,
Q 0.11 will you to visit this section, and Figure 4 in particular.
Remarks on assigned exercises .
q.0.14 parts i, iii, and iv (i.e., omit the repetitive part ii)
-- Student's t distribution:
Note that in part i, at issue is getting
the 'exact' p-value corresponding to the difference
of 33.7, and his 'SD' of 63.1, i.e., using the
pt( .., df=10) function.
(Student's table only
went up to n=10, or 9 df;
thus his 'home-grown' approximation when n=11.)
Before calculating the t-statistic, you
need to first scale up his `SD'a little bit,
since he used a divisor of 11 rather than 10.
q.0.17 and q.0.18
-- Sample Size Considerations: Precision / Power.
The diagram in the milk example on page 5 of
handout for EPIB607 should be of help for q.17
The diagram in the milk example on page 6 of
handout for EPIB607 should be of help for q.18 --
we are interested in values to the LEFT of the null.
(the mix of milk and water is to the RIGHT of the null)