BIOS601 AGENDA: Thursday September 20, 2018

[updated Sept 18, 2018]

Agenda for Thursday September 20, 2018

Discussion of issues in JH's Notes and assignment on mean/quartile of a quantitative variable

Answers to be submitted by Friday 21st:

Exercise 0.14 parts i, iii, and iv (omit part ii),
Exercise 0.17
Exercise 0.18

See notes below. But, first, a few remarks on the Notes themselves:

Section 4.

Student's problem was not about the n at which the CLT kicks in [he was already assuming the component r.v.s' are N(,)] but about when the sample standard deviation (s) is a good substitute (proxy) for the 'true' but unknown 'population' standard deviation (sigma): "no one has yet told us very clearly where the limit between 'large' and 'small' samples is to be drawn."

In connection with the 100th anniversary of Student's ground-breaking 1908 paper, JH and colleagues went back to that paper, and to the way he mathematically derived what is now called the 't' distribution. Full details, including why Gosset called himself 'Student', and his simulations to check out his shaky algebra, can be found at Article/Material in connection with 100th Anniversary of Student's 1908 paper.

The 2 persons in the photo with JH at the reception at the Guinness Brewery in Dublin in 2008 are the grandson and granddaughter of William Gosset ('Student'). At the unveiling of the plaque, the grandson told us that he was pretty sure he alone, of those assembled in 2008, had personally met Gosset. He was 6 months old when he was brought in to the hospital to see his grandfather, a few months before the grandfather died in 1937 in London (Gosset was English-born and educated, but worked for Guinness in the Dublin HQ from 1899 onwards. In 1935, he moved to London to take charge of the scientific side of production, at a new Guinness brewery at Park Royal in North West London, but died just two years later, at the age of 61. Short bio.

Interestingly, that 1908 paper was of limited use, since it dealt only with 1-sample problems. It took Fisher's insights in the 1920s to generalize it to not just 2-sample problems, but also correlation and regression, indeed to any context where one was dealing with a ratio of a mean or correlation or slope to its standard error; in turn, the SE involved the sqrt of an independent plug-in estimate of the unit variance. Fisher called the no. of independent contributions to that estimate the "degrees of freedom". In this context, JH usually defines the "d.f." as "the number of independent estimates of error": think of the number of independent residuals (which one "pools" to get one overall estimate of sigma-squared) as a case in point. It is no different in spirit from pooling the squared within-group deviations from their own means [they are also residuals, from each fitted (ie group) mean].

In "Another worked Example, with graphic", JH is trying to get statisticians and their collaborators to use a better way to display paired data: the usual presentations involve separate SE's for the two means, as though the one mean was from one sample of n, and the other from and entirely separate (independent) sample of n.

Section 4.3/4.4

Other years, we left sample size and precision issues until later in the course, where we planned deal with them 'en masse.' But many years we never had the time at the end of the course. So this year, following on from the calculations you did with the step-counter data, Q 0.11 will you to visit this section, and Figure 4 in particular.

Remarks on assigned exercises .

q.0.14 parts i, iii, and iv (i.e., omit the repetitive part ii)

-- Student's t distribution:

Note that in part i, at issue is getting the 'exact' p-value corresponding to the difference of 33.7, and his 'SD' of 63.1, i.e., using the pt( .., df=10) function.

(Student's table only went up to n=10, or 9 df; thus his 'home-grown' approximation when n=11.)

Before calculating the t-statistic, you need to first scale up his `SD'a little bit, since he used a divisor of 11 rather than 10.

q.0.17 and q.0.18 -- Sample Size Considerations: Precision / Power.

The diagram in the milk example on page 5 of handout for EPIB607 should be of help for q.17

The diagram in the milk example on page 6 of handout for EPIB607 should be of help for q.18 -- we are interested in values to the LEFT of the null. (the mix of milk and water is to the RIGHT of the null)