# Protocol

Alexander Zwart

### Definitions

The standard deviation is a measure of the spread of the observations in a statistical sample or population. The standard error of a sample statistic (such as a sample mean) is a measure of the degree of uncertainty in that sample statistic.

### Terminology and equations

Physical sample vs Statistical sample

In what follows it is important to know the meaning of the word -sample’ when used in a statistical (rather than a physical) sense. A -physical sample’ is a sample of material taken from an environment – a test tube of river water, a section of leaf, etc. A -statistical sample’ is a collection of observations (whether these are measurements, counts, scores or recorded categories) taken from a population of interest. In what follows, the word -sample’ can be assumed to mean -statistical sample’ unless explicitly indicated otherwise.

Population

Similarly we can distinguish between a -physical population’ and a -statistical population’. A physical population is the total set of units to be observed, such as the trees in a forest, or all possible Janz wheat plots grown in a specified region, in a specified year (as in the latter example, populations can be hypothetical in nature). A statistical populationis the set of all values of an observation of interest that could be obtained from the physical population, such as the circumferences (measured at breast height) of all of the trees in the aforementioned forest, or the set of all possible total grain yields from wheat plots grown in the specified region and year. In what follows, we may use the word -population’ in either sense – the context of the use should make it clear which meaning is intended.

The Standard Deviation (SD)

The standard deviation (SD for short) is a measure of the spread of a set of numbers. The -true’ or -population’ SD is usually denoted , while its estimate, the sample SD, is denoted s. Given a set of observations xi, i=1…n, the formula for the sample standard deviation is:

A formula (not shown here) also exists for the population standard deviation, which is only applicable when you have observed every unit in the population of interest- as this is rarely the case in practice, the sample standard deviation formula above is by far the most commonly used. In using statistical software to calculate the standard deviation, you may need to ensure you are using the appropriate form. If the software has a default choice, this will generally be the sample standard deviation.

Origins and nature of the Standard deviation

The standard deviation as a measure of spread has its origins in the mathematical theory of the Normal (or Gaussian) distribution – the classic -bell curve’-shaped distribution:

Provided a population follows the normal distribution, approximately 68.27% of observations are expected to lie within one SD of the population mean indicated by in the above diagram. Approximately 95.45% of observations are expected to lie within two SD’s of the mean. And approximately 99.73% of observations are expected to lie within three SD’s of the mean (the percentages are often abbreviated to -approximately 68%, 95% and 99.7%’ respectively. Thus interpretation of a particular SD value in terms of the actual spread of a set of numbers is not entirely trivial. However, mathematical theory underlying the normal distribution provides the ability to calculate the probability of observing values that lie within different ranges, hence giving rise to the theory of the hypothesis test and confidence intervals for means or differences between means, via the Standard Error, discussed below.

The Standard Error of the sample mean (SEM, or SE)

The Standard Error is a measure of the uncertainty in a sample statistic (such as a mean or correlation) that has been obtained from a statistical sample. For a simple sample mean, the estimated standard error is

…where n is again the size of the statistical sample from which the sample mean was calculated. This formula for the sample statistic relies on the assumption that the observations in the statistical sample are independent- i.e., the value of any observation does not influence the value of any other. Standard errors can also be calculated for other kinds of sample statistic – examples are not given here.

Distinctions in usage

In practice, a standard deviation is quoted when one wishes to indicate the spread of individual measurements in the population from which one sampled. The standard error is quoted as an indication of the level of uncertainty in a sample statistic such as an observed sample mean. Since the value of the standard error depends on the sample size, it is not to be interpreted as indicating the spread in a population – a larger statistical sample naturally leads to a smaller SE (reflecting the improved accuracy obtained via a larger sample size), while the SD of the population remains unchanged. Standard errors of means in turn give rise to the Standard Error of the Difference (SED), which is the statistic used to assess the statistical significance of differences between observed sample statistics (most commonly, sample means).