← back

(Thanks to my fellow students Veli Mahlangu and Tamim Al-Mutawa for the great ideas in your posts which I've used to improve my initial post.)

Misinterpretation of Statistical Information

A p-value (probability value) in statistics measures the strength of evidence that a test has no effect. The p-value is the probability of observing results as extreme as, or more extreme than, the actual results. The term "statistically significant" means the p-value is below a certain value - usually 5%. The p-value does not measure the size, direction or contextual importance of an effect. A p-value by itself does not give a reason for a small or high value.

When reporting findings based on statistics, p-values are often misused (SpringerNature, 2016). The real meaning of the p-value is often misunderstood (or deliberately misdefined) and some researchers use the p-value as actual proof of a hypothesis. In some cases, tests are specifically chosen or designed because they lead to small p-values. This misuse has created an arbitrary, oversimplified classification of results into "significant" and "non-significant".

Many scientific journals are encouraging better use of p-values, and more use of "confidence intervals". A confidence interval is a range of values, derived from sample data, that is likely (usually 95% likely) to contain the true value of an unknown population parameter, such as the new mean after a treatment. One journal has actually banned the use of p-values as well as confidence intervals! Certainly confidence intervals should not be reported in situations where the data quality is weak or the assumptions behind the model are not met.

Although p-values are often used as evidence for "statistical significance", a p-value of below 5% (or any small value) is arbitrary and doesn't prove contextual importance or meaningfulness. There is really no fixed level of significance which applies to all circumstances.

P above or below 0.05 is not a universal arbiter of discovery.
(SpringerNature, 2016)

Statistical models rely on assumptions which in practice can be very difficult to satisfy. A p-value is accurate if all assumptions (the entire model) are accurate - a small p-value could be because the hypothesis is false, but it could also be because one or more of the model assumptions were violated. A small p-value simply indicates that the data is unusual if all the assumptions used to compute it were correct.

One assumption is how these results were chosen for presentation, as opposed to some other results. In practice the results are often chosen for presentation specifically because the p-value is less than 5%. Unfortunately results can also be influenced when testing is funded by those with a stake in the results!

The Journal of the American College of Cardiology (2021) states that many scientific journals now promote the use of confidence intervals, supplemented with p-values if needed. However, any conclusion offered about the probability of a hypothesis cannot be derived from statistical methods alone. Information about the hypothesis beyond that contained in the analysed data and in conventional statistical models must be used to reach such a conclusion, and that information should be explicitly acknowledged and described by those offering the conclusion.

Statistical results are seldom certain. Researchers can embrace uncertainty, and report confidence intervals and p-values with an interpretation of what these numbers actually mean in the context of what is being studied.

References

SpringerNature (2016) Statistical tests, P values, confidence intervals, and power: a guide to misinterpretations. Available at: https://link-springer-com.uniessexlib.idm.oclc.org/article/10.1007/s10654-016-0149-3 (Accessed: 24 March 2026).

Journal of the American College of Cardiology (2021) More Confidence Intervals and Fewer p Values: A Positive Trend? Available at: https://www.jacc.org/doi/10.1016/j.jacc.2021.02.004 (Accessed: 24 March 2026).