Wim Ruitenburg's Fall 2012 MATH 1300-101

Marquette University

Department of Mathematics, Statistics and Computer Science

Wim Ruitenburg's Fall 2012 MATH 1300-101

Last updated: October 2012
Comments and suggestions: Email wimr@mscs.mu.edu

Sampling from chapter 13

Good sampling of good data for statistical purposes may look boring and easy. It certainly is not easy. In this chapter we see examples of sampling data from large populations, and then see some failed attempts at drawing good conclusions about the population by misreading the meaning of the sample or by overrating the value of the sample or by having a badly developed sample from the beginning.
What follows are some concepts that are of relevance in trying to sample good data.

The N-value is the size of the population, from which we are to sample data. This value N need not be constant, thereby complicating our ability to draw conclusions from our sample(s).
A parameter is a true value about a population that we would like to know, or at least like to know approximately.
A statistic is a value that we derive from our sample, and that we hope is a good approximation of a parameter.
The difference between statistic and parameter is called the sampling error. There are two main causes of sampling error.
- Chance error.
- Sampling bias.
In a survey (like a poll) we pick a so-called sample frame (say of size n) from the target population (say of size N).
The sampling proportion is the fraction (n / N) (sample size n over population size N). In general, a larger sampling proportion implies a smaller chance error.
Our sample frame may suffer from so-called selection bias. Examples of selection bias are
- Nonresponse bias.
- Convenience bias.
- Quota bias.
Ideally we perform random sampling. Guaranteeing randomness is not easy. Even the meaning of what is a random sample is not easy.
The population can often be clearly partitioned into subpopulations called strata. When we build a sample, we may divide the elements of the sample into sample strata that reflect from which stratum of the population each sample element originates. In some sense we sample the different strata somewhat independently, although the total collection is still called a sample or a stratified sample. There is a close connection between strata and quota sampling.
The capture-recapture method (page 508 of our book) is often a way to get a good statistical approximation of the population size N. Randomly capture n_1 members of the population, mark them, and release. A while later randomly capture n_2 of the population, and count the number k of them that are captered twice (recaptured) as can be seen because they are marked. Then k / n_2 approximately equals n_1 / N. So N = (n_1 * n_2) / k.

Example Problem(s)

Recommended problems from Chapter 13 of the book:
1, 2, 3, 4
Recommended problems from Chapter 13 of the book:
9, 10, 11, 12
Recommended problems from Chapter 13 of the book:
33, 34, 35