What is Statistical Significance?
Evidence-based practice is supposed to enhance practical decision-making, but interpreting research is often difficult for some practitioners . As such, clinical research is only of value if it is properly interpreted . Underpinning many scientific conclusions is the concept of ‘statistical significance’, which is essentially a measure of whether the research findings are actually true. In other words, statistical significance is the probability that the observed difference between two groups is due to chance or some factor of interest [9, 10]. When a finding is significant, it simply means that you can feel confident that it is real, not that you just got lucky in choosing the sample.
The most common method of statistical testing is using a specified statistical model to test the null hypothesis (Null Hypothesis Significance Testing; NHST) against a predetermined level of significance . This method is essentially generated from four components:
The null hypothesis postulates the absence of an effect (e.g. no relationship between variables, no difference between groups, or no effect of treatment) [9,12] For example, that in reality, there is no association between caffeine consumption and reaction times. This is the formal basis for testing statistical significance. By starting with the proposition that there is no association, statistical tests can estimate the probability that an observed variation is due to chance or some factor of interest .
The alternative hypothesis is the proposition that there is an association between the predictor and outcome variable . For example, there is an association between caffeine consumption and reaction times.
The statistical model is the statistical test chosen to analyse the data and is constructed under a set of assumptions that must be met in order for valid conclusions about the null hypothesis to be made . Examples of statistical tests include: Independent t-test, ANOVA, and Pearson’s correlation.
Level of Significance
A predetermined level of significance allows for the null hypothesis to either be rejected or accepted . The significance level that is widely used in academic research is 0.05, which is often reported as ‘p = 0.05’ or ‘α = 0.05’. The null hypothesis is rejected in favour of the alternative hypothesis if the calculated p-value is less than the predetermined level of significance. For instance, if you were to analyse a set of data looking at reaction times following caffeine consumption, with the resulting significance value being p = 0.03 you are able to reject the null hypothesis and accept the alternative hypothesis, on the basis that all assumptions for the statistical model were met. This is because, the smaller the p-value, the greater the statistical incompatibility of the data with the null hypothesis . In other words, the smaller the p-value, the more unusual the data would be if every single assumption were correct .
There is a common misconception that lower p-values are associated with having a stronger treatment effect than those with higher p-values . For example, an outcome of 0.01 is often interpreted as having a stronger treatment effect than an outcome of 0.05. Whilst this is true if we can be certain that every assumption was met, a smaller p-value does not tell us which assumption, if any, is incorrect. For example, the p-value may be very small because, indeed, the targeted hypothesis is false; however, it may instead be very small because the study protocols were violated . As a result, the p-value tells us nothing specifically related to the hypothesis unless we are absolutely positive that every other assumption used for its computation is correct . In other words, a lower p-value is not synonymous with importance. Therefore, we must take caution when accepting or rejecting the null hypothesis and should not be taken as proof that the alternative is indeed valid .
Although the use of the p-value as a statistical measure is widespread, the sole use and misinterpretation of statistical significance has led to a large amount of misuse of the statistic and thus has resulted in some scientific journals discouraging the use of p-values . For instance, NHST and p-values should not lead us to think that conclusions can be a simple, dichotomous decision (i.e. reject vs not reject) . A conclusion does not simply become “true” on one side of the divide and “false” on the other . In fact, many contextual factors (i.e. study design, data collection, the validity of assumptions, and research judgement) can all contribute to scientific inference rather than by finding statistical significance [9,12]. Despite these criticisms, the recommendation is not that clinical researchers discard significance testing, but rather that they incorporate additional information that will supplement their findings . With that being said, it is important that statistical significance can be correctly interpreted to avoid further misuse.