The Student t distribution

by admin

It is usual to assume in all types of analyzes, tests or calibrations, that repetitive events without external stimuli that vary their probabilities, will be distributed according to a normal or gaussian distribution defined by the mean and the standard deviation calculated for the sample. Strictly speaking this is only true when the number of repetitions is large, consistent with the central limit theorem, however when we do not have enough information to describe the properties of this gaussian distribution because our study sample is not large enough, suppose that these conditions are also fulfilled, we will surely throw values ​​of uncertainty underestimated for our measurement, as indicated in the guide JCGM 100 – Guide to the expression of uncertainty in measurement.

This same problem was raised by William Gosset, who signed his work as “Student” for reasons of business confidentiality of the company where he worked. Gosset needed to estimate, from experimental data, a distribution that represented small samples of unknown variance. This distribution function proposed by Gosset is known as Student t distribution, and responds to the following general equation:

In any normally distributed population, the Student t distribution allows increasing the width of the resulting normal distribution to increase the uncertainty associated with the measurand as a result of the poverty of information provided by a small sample on the total lot. To the extent that this sample is larger, the distribution t will approach the Normal obtained from the standard deviation of the sample until it is identical to this latter for infinite repetitions of the event.

The correct thing in all types of analysis is to assign to repetitive events the distribution t with a parameter gl, which will be the degrees of freedom, whose value will be the number of repetitions minus 1. MCM Alchimia allows to simulate a random sample according with the Student t distribution, not only with this parameter of form (degrees of freedom), but with parameters of scale and position, through the standard deviation and the mean respectively, so that it can be used in any situation where appropriate, with no additional operations.

Input parameters:

  • Mean value. This parameter defines the displacement of the function on the abscissa axis. Corresponds to the average value, or average, of the random variable. The data collection of this variable, therefore, will be distributed on both sides of this function. In the case of this distribution, as in all symmetrical functions, the average will coincide with statistical Mode.
  • Degrees of freedom. Corresponds to the number of repetitions minus 1, Represents the number of values ​​that can vary without modifying the value of the sample mean.
  • Standard deviation. Measure of the dispersion of the values ​​with respect to the sample mean. If this distribution is used for Type A (statistical) uncertainty components, this value can be calculated according to the equation:

    where n is the number of values ​​or repetitions. On the other hand, if what you want to know is the standard deviation of the sample means, this value can be obtained by dividing s / √ n .