Statistical hypothesis testing

A statistical hypothesis test is a method of making statistical decisions using experimental data. In statistics, a result is called statistically significant if it is unlikely to have occurred by chance. The phrase "test of significance" was coined by Ronald Fisher: "Critical tests of this kind may be called tests of significance, and when such tests are available we may discover whether a second sample is or is not significantly different from the first."[1]

Hypothesis testing is sometimes called confirmatory data analysis, in contrast to exploratory data analysis. In frequency probability, these decisions are almost always made using null-hypothesis tests; that is, ones that answer the question Assuming that the null hypothesis is true, what is the probability of observing a value for the test statistic that is at least as extreme as the value that was actually observed?[2] One use of hypothesis testing is deciding whether experimental results contain enough information to cast doubt on conventional wisdom.

Statistical hypothesis testing is a key technique of frequentist statistical inference, and is widely used, but also much criticized. The main direct alternative to statistical hypothesis testing is Bayesian inference. However, other approaches to reaching a decision based on data are available via decision theory and optimal decisions.

The critical region of a hypothesis test is the set of all outcomes which, if they occur, will lead us to decide that there is a difference. That is, cause the null hypothesis to be rejected in favor of the alternative hypothesis. The critical region is usually denoted by C. A statistical test procedure is comparable to a trial. A defendant stands trial and is considered innocent as long as his guilt is not proven. The prosecutor tries to prove the guilt of the defendant. Only when there is enough charging evidence the defendant is condemned.

In the start of the procedure there are two hypotheses H0: "the defendant is innocent", and H1: "the defendant is guilty". The first one is called null hypothesis, and is for the time being accepted. The second one is called alternative (hypothesis). It is the hypothesis one tries to prove.

In good law practice one doesn't want to condemn an innocent defendant. That's why the hypothesis of innocence is only rejected when an error is very unlikely. Such an error is called error of the first kind (i.e. the condemnation of an innocent person), and the occurrence of this error is controlled to be seldom. As a consequence of this asymmetric behaviour, the error of the second kind (setting free a guilty person), is often rather large. A person is tested for clairvoyance. He is 25 times shown the backside of a randomly chosen play card and asked which suit it is. The number of hits is called X.

As we try to prove his clairvoyance, for the time being the null hypothesis is the person is not clairvoyant. The alternative is of course: the person is (more or less) clairvoyant

If the null hypothesis is valid, the only thing the test person can do is guess. For every card the probability of guessing right is 1/4. If the alternative is valid the test person will predict the suit right with probability greater than 1/4. We will call the probability of guessing right p. The hypotheses then are: