BIOS601 AGENDA: Week of September 10 to September September 17, 2021
[updated September 08, 2022]
Agenda for Week of September 10 to September September 16, 2022
- Discussion of issues
in JH's
Notes and assignment on C&H Ch01 [prob. Models] and Ch02 [conditional Prob. Model]
Answers to be uploaded to MyCourses individually by end of the 'business' week for:
Supplementary exercises 1.1 (p.4), 1.2 (p.5), 2.1 , 2.2[PhD], 2.3,
2.4[PhD], 2.5 (p. 9), 2.15, 2.18
Answers to Supplementary exercise 2.11 to be uploaded by 'corresponding authors' of
2 teams of non-PhD students (teams and authors to be decided)
Answers to Supplementary exercise 2.13 to be uploaded by corresponding author of
1 team of PhD students
Remarks:
Chapter 1 of C&H introduces some ways of looking at statistical
entities and concepts that you may not have met, as well as some
terminology that is used in a more specific way in epidemiology. You might want to
look at section 1 of JH's notes, from earlier years, on
Concepts involved in Occurrence Measures in Epidemiology.
JH has also included the first page of this section (mostly definitions) in
the notes that annotate the C and H chapters: he has placed it under the heading
'Important: Concepts and Terms in Epidemiology'
after his notes on 1.2 Binary data, and before 1.3 The binary probability model.
Supplementary Exercise 1.1 is designed to get you familiar with the
'other' scales
for measuring probabilities, and when the odds and probability measures are close, and when they diverge.
Other scales you will need to become very familiar with are the logit and the probit scales
We show all of these in one graph in our 'under construction'
online textbook for epidemiology students.
.
The online book has newer versions of some of the graphs JH in these notes, as well as additional commentaries.
JH's notes on Section 1.4 of C&H (and Supp. exercise 1.2)
are intended to 'shake you up a bit' and force you to think
outside the box as for how you used to estimate the parameters
of a simple linear regression. This model is usually
shown as a 2-parameter (slope, intercept) model, but JH has
deliberately reduced the model to a 1-parameter version,
with the "line" going through the origin [other examples
might be trying to estimate (from error-containing
measurements of the volumes of 2 spheres of different radii:
radii measured withut error!)
the constant in the relation:
Volume of a sphere = "some constant" times the cube of its diameter.]
The fewer the elements involved, the more chance there is to really
master the fundamentals and 'join the dots.'
He has recently added a
shiny app that allows additional criteria for the 'fit'.
You can also try the another 1-parameter (elevator) example at the bottom of that webpage.
Chapter 2 of C&H is -- to JH at least -- a very elegant and simple
and graphic way to introduce probabilities, and particularly
those that are linked to each other in time, or by
additional pieces of knowledge. And notice how many probabilities
of interest go from right to left, i.e., from after to before.
It is worthwhile to work through C&H's own exercises and then check
your answers agains the solutions they provide at the end of Ch 2.
Fig 2 in JH's Notes on Ch 2 has several simple but educational
examples showing the different 'directionalities'. It also
emphasizes that products of probabilities are like 'fractions of fractions'
but that sometimes, the probabilities depend on what has gone before,
and sometimes do not. (the online book has newer diagrams)
The 2 stories accompanying the Notes on section 2.2 should serve as a stark
and frightening reminder that P(theta|data) is a very different 'animal'
than P(data|theta) and that the consequences of mixing them can be enormous.
If you want a topical example, think of the difference between
P(A|B) and P(B|A), where A = the hypothesis that Higgs Boson particles exist,
and B = the bump in the curve. Btw, JH likes
to label the elements in what appears to be the best 'logical' or
'chronological' or 'causal' order, i.e., A -> B, but notices
that many textbooks teach the concepts using arbitrary letters.
JH's notes on Section 2.3 have a genetics (haemophilia) example that is
still very relevant. But, since he first encountered it 40 years ago,
medical science has advanced , and so one doesn't not now need to wait
until the woman has one or more offspring before learning about her carrier status.
JH would be grateful for a different example where one would still
need to wait.
At a debate a few years ago, JH came up with the challenge of
estimating/judging a person's age from various pieces of information.
You might like to take a quick look at
the
example & pieces of information provided
Supplementary Exercise 2.1 ('Efron's twins story') can be tackled in many ways.
Efron uses the odds scale to go from 'pre-' to 'post'-test odds, and then switches back to the probability scale.
We do the same when teaching medical students about diagnostic tests. Fortunately, today, with
readily accessible apps, there is less emphasis on the calculation, and more on the probabilities themselves.
A few pages further in the notes, you can will see what (paper) 'apps' were like in 1975! Fagan's
nomogram is still a clever tool, and JH has used it as a starting point for a shiny app
cited on the coloured box on the right hand side of page 8 of his Notes. This box gives you links to
the 'terminology' for the errors/performance of medical diagnostic tests (If JH had his way, we would
never have invented the terms sensitivity and specificity) and the correspondences with
statistical tests.
Supplementary Exercise 2.2 ('The Monty Hall Problem') can be very frustrating
and is easily misunderstood. JH has had to break up
fights between people who are over-confident but under-listening.
Key is the fact that Monty Hall KNOWS
which door contains which: sometimes
(how often?)
he has a choice of 2 doors that he could open
to reveal nothing, and sometimes (how often?)
he only has 1 choice.
In Exercise 2.3, it is equally important to be
very precise as to the
information provided.
In Exercise 2.4, we have another good example of the difference between P(H|data)
and P(data|H). Notice here that we are not examining a range of possible
H's, just 2 specific H's. Notice further that in the Bayesian approach we do not consider
data values that have not been observed; in contrast, the p-value does consider data values
that have not been observed (we should not call such unobserved values 'data', but
rather, potential data values.
JH finds that diagrams, especially 'tree' diagrams, can be very
helpful in these types of problems, and again when we revisit the Binomial.
Q2.5 was new in 2015,
so the wording hasn't had the same beta-testing as 2.1-2.4.
Its a pity that in the otherwise clever 'left brain' article,
the BMJ messed up on the 'teaser' introduction. JH
finds The Economist graphics clearer and simpler. What about you?
Q2.11 was new in 2019, having been prompted by a
(since withdrawn) tutorial article 'How to investigate an accused serial sexual harasser'
in Statistics in Medicine. If you Google it, you will see that
it generated considerable 'heat'. The tutorial referred indirectly to the data given in the exercise.
JH took a special interest
in the topic because of his involvement in reviewing the 2003 report.
Q2.12 is new in 2020, and was prompted by the coverage of the Santa
Clara study in Andrew Gelman's blog. The Santa Clare study was also the basis
for exercise 22 in the measurement material, and a question
in the Part A (bios700) PhD exam of August 4, 2020. We will come back to it again
when we adress Likelihood-only methods in C&H chapter 3.
Q2.13 is new in 2021, and was prompted by the increasing numbers of
statements about who the patients are that are being hospitalized for COVID-19.
It is also a great chance to learn a very common epidemiologic design,
one that goes by a very pooly chosen name -- the so-called (and very badly called)
'CASE-CONTROL' study.
We explain what it involves, and why it is a simple and otherwise standard comparison of two rates,
but where the (relative) (or maybe even the absolute) sizes of the
person-time denominators are ESTIMATED (using a denominator series) rather than KNOWN.