Why is Reliability important?
Understanding and testing reliability is relevant for both the practitioner and the researcher when selecting a measure , since it provides insights into the biological (e.g. circadian rhythm), environmental (e.g. wind speed), and/or technical (e.g. timing gate height) factors influencing score variance .
Upon starting with a new team, practitioners need to determine the purpose of the testing (e.g. classifying athletes as needing an intervention or tracking progress). After this, the focus should be to review the literature to find the best test for the envisioned purpose(s) (i.e. which test is most reliable and valid).
In this whole process, close attention should be given to specific factors of the study (e.g. sample characteristics: Regional vs. Olympic Athletes or testing procedures: timing gate height). In fact, if those factors are different in the practice than in the study, the reliability of the findings cannot be expected to be similar. For example, if a professional footballer was to perform a 40-m sprint test using timing gates it may render an ICC of 0.75, but the same 40-m sprint in high school athletes using a stopwatch will be much lower. This demonstrates the varying levels of reliability for the same test using different athletes and different equipment.
Finally, strict adherence to the procedures described in the supporting literature (e.g. equipment, test administrator, technical procedures, and familiarisation amongst many) is the key to best practice where results are reliable and valid .
In research, reliability is a useful tool to review the literature and help with study design.
Firstly, knowing about reliability will give insights into the relevance of results reported in the literature. For example, one can relate the change observed in an intervention study (e.g. +10%) to the reliability of the testing protocol used or cited. If the CV of the test is ± 6%, a retest value of +10% is clearly within the range where we can feel confident that no change has occurred. The fact that it reached statistical significance only demonstrates sufficient statistical power, not clinical significance.
Secondly, to estimate sample size, a well-designed study should account for the precision of the measurement used [5, 9-11]. The less precise the measurement, the larger the sample size will have to be in order to have enough statistical power to see a significant effect.
In order to have reliable results which can be used to implement coaching strategies or to publish as scientific literature, the following rules and procedures need to be implemented and documented:
Train your testers
- Document their training (e.g. duration and nature)
- Assess learning outcomes (e.g. checking mastery of testing procedure)
- Use reference protocols (e.g. standardised warm-up)
- Familiarise to your test (e.g. conduct a ‘dummy run’ of the test before the study officially starts)
Implement the actual test as a pilot study
- Analyse the results (i.e. Inter-tester reliability, Data reliability)
Address potential issues
- Modify testers training
- Address design issues (e.g. not enough rest time or unrealistic protocol)