Department of Mathematics, Statistics
and Computer Science
Wim Ruitenburg's Fall 2012 MATH 1300-101
Last updated: October 2012
Comments and suggestions: Email wimr@mscs.mu.edu
Sampling from chapter 13
Good sampling of good data for statistical purposes may look boring and easy.
It certainly is not easy.
In this chapter we see examples of sampling data from large populations,
and then see some failed attempts at drawing good conclusions about the
population by misreading the meaning of the sample or by overrating the value
of the sample or by having a badly developed sample from the beginning.
What follows are some concepts that are of relevance in trying to sample good
data.
- The N-value is the size of the population, from which we are to
sample data.
This value N need not be constant, thereby complicating our ability to draw
conclusions from our sample(s).
- A parameter is a true value about a population that we would like
to know, or at least like to know approximately.
- A statistic is a value that we derive from our sample, and that
we hope is a good approximation of a parameter.
- The difference between statistic and parameter is called the sampling
error.
There are two main causes of sampling error.
- Chance error.
- Sampling bias.
- In a survey (like a poll) we pick a so-called sample frame (say of size
n) from the target population (say of size N).
- The sampling proportion is the fraction (n / N) (sample size n
over population size N).
In general, a larger sampling proportion implies a smaller chance error.
- Our sample frame may suffer from so-called selection bias.
Examples of selection bias are
- Nonresponse bias.
- Convenience bias.
- Quota bias.
- Ideally we perform random sampling.
Guaranteeing randomness is not easy.
Even the meaning of what is a random sample is not easy.
- The population can often be clearly partitioned into subpopulations
called strata.
When we build a sample, we may divide the elements of the sample into sample
strata that reflect from which stratum of the population each sample element
originates.
In some sense we sample the different strata somewhat independently, although
the total collection is still called a sample or a stratified sample.
There is a close connection between strata and quota sampling.
- The capture-recapture method (page 508 of our book) is often a way to
get a good statistical approximation of the population size N.
Randomly capture n_1 members of the population, mark them, and release.
A while later randomly capture n_2 of the population, and count the number k
of them that are captered twice (recaptured) as can be seen because they are
marked.
Then k / n_2 approximately equals n_1 / N.
So N = (n_1 * n_2) / k.
Example Problem(s)
- Recommended problems from Chapter 13 of the book:
1, 2, 3, 4
- Recommended problems from Chapter 13 of the book:
9, 10, 11, 12
- Recommended problems from Chapter 13 of the book:
33, 34, 35