BIOS601 AGENDA: Tuesday September 10, 2013
[updated Septemver 03, 2013]
Agenda for Tuesday Sept 10, 2013
- Discussion of issues
in JH's
Notes and assignment on C&H Ch01 [prob. Models] and Ch02 [conditional Prob. Model]
Answers to be handed in for:
Supplementary exercises 1.1, 1.2, 2.1, 2.2, 2.3, 2.4
Remarks:
Chapter 1 of C&H introduces some ways of looking at statistical
entities and concepts that you may not have met, as well as some
terminology that is used in a more specific way in epidemiology. You might want to
look at section 1 of JH's notes, from earlier years, on
Concepts involved in Occurrence Measures in Epidemiology.
JH's notes on Section 1.4 of C&H (and Supp. exercise 1.2)
are intended to 'shake you up a bit' and force you to think
outside the box as for how you used to estimate the parameters
of a simple linear regression. This model is usually
shown as a 2-parameter (slope, intercept) model, but JH has
deliberately reduced the model to a 1-parameter version,
with the "line" going through the origin [other examples
might be trying to estimate (from error-containing
measurements of the volumes of 2 spheres of different radii:
radii measured withut error!)
the constant in the relation:
Volume of a sphere = "some constant" times the cube of its diameter.]
The fewer the elements involved, the more chance there is to really
master the fundamentals and 'join the dots.'
Chapter 2 of C&H is -- to JH at least -- a very elegant and simple
and graphic way to introduce probabilities, and particularly
those that are linked to each other in time, or by
additional pieces of knowledge. And notice how many probabilities
of interest go from right to left, i.e., from after to before.
It is worthwhile to work through C&H's own exercises and then check
your answers agains the solutions they provide at the end of Ch 2.
Fig 2 in JH's Notes on Ch 2 has several simple but educational
examples showing the different 'directionalities'. It also
emphasizes that products of probabilities are like 'fractions of fractions'
but that sometimes, the probabilities depend on what has gone before,
and sometimes do not.
The 2 stories accompanying the Notes on section 2.2 should serve as a stark
and frightening reminder that P(theta|data) is a very different 'animal'
than P(data|theta) and that the consequences of mixing them can be enormous.
If you want a topical example, think of the difference between
P(A|B) and P(B|A), where A = the hypothesis that Higgs Boson particles exist,
and B = the bump in the curve. Btw, JH likes
to label the elements in what appears to be the best 'logical' or
'chronological' or 'causal' order, i.e., A -> B, but notices
that many textbooks teach the concepts using arbitrary letters.
JH's notes on Section 2.3 have a genetics (haemophilia) example that is
still very relevant. But, since he first encountered it 40 years ago,
medical science has advanced , and so one doesn't not now need to wait
until the woman has one or more offspring before learning about her carrier status.
JH would be grateful for a different example where one would still
need to wait.
At a debate a few years ago, JH came up with the challenge of
estimating/judging a person's age from various pieces of information.
You might like to take a quick look at
the
example & pieces of information provided
Supplementary Exercise 2.2 ('The Monty Hall Problem') can be very frustrating
and is easily misunderstood. JH has had to break up
fights between people who are over-confident but under-listening.
Key is the fact that Monty Hall KNOWS
which door contains which: sometimes (how often?)
he has a choice of 2 doors that he could open
to reveal nothing, and sometimes (how often?)
he only has 1 choice.
In Exercise 2.3, it is equally important to be precise as to the
information provided.
In Exercise 2.4, we have another good example of the difference between P(H|data)
and P(data|H). Notice here that we are not examining a range of possible
H's, just 2 specific H's. Notice further that in the Bayesian approach we do not consider
data values that have not been observed; in contrast, the p-value does consider data values
that have not been observed (we should not call such unobserved values 'data', but
rather, potential data values.
JH finds that diagrams, especially 'tree' diagrams, can be very
helpful in these types of problems, and again when we revisit the Binomial.