Contents of Article
- Why is fitness testing important in sports?
- What is the smallest worthwhile change?
- Why is the smallest worthwhile change important?
- How is the smallest worthwhile change calculated?
- Is further research into the smallest worthwhile change needed?
- About the Author
Laboratory and field-based performance/fitness tests are a fundamental element of athlete profiling. Conducting fitness testing not only allows the coach to measure performance improvements or decrements but also design the training programme accordingly. Therefore, identifying a meaningful change in performance is a critical factor in the athlete profiling process.
A meaningful change in performance can be determined by calculating the smallest worthwhile change. However, we must also be mindful of the variation that occurs between testing trials, as this is often higher than the calculated smallest worthwhile change. As a result, a combination of the smallest worthwhile change, the coefficient of variance, and two times the coefficient of variance can be used to set achievable, but also meaningful, targets for individual athletes.
Why is fitness testing important in sports?
A basic precondition of the long-term athlete development structure is the regular assessment of physical performance, which forms a major component of the sport science service (1). Fitness testing is typically conducted several times per year to evaluate changes in physical qualities (e.g. strength or speed) relevant to performance. For example, a 5m sprint test may be used to measure an athlete’s ability to accelerate. As a result, laboratory or field-based tests are seemingly fundamental elements of athlete profiling, with the ability to track performance change and programme efficacy (2).
With the aim of determining performance improvements or decrements in these individual physical performance markers, it is important to identify what change in performance is actually considered meaningful (3).
What is the smallest worthwhile change?
The ability to reliably detect meaningful changes in fitness tests over time is a critical factor when assessing athletes (4). However, due to a plethora of influential factors (such as arousal, caffeine consumption, sleep quality and quantity, fatigue levels, ambient temperature, ground surface, and many others), fitness test results are almost always guaranteed to show variation every time an athlete is tested.
Due to this variation, simply identifying that an athlete performed better in a fitness test over a period of time does not provide sufficient evidence that this change is actually meaningful. This is because an observed change in performance could be caused by variation in fitness testing (by any of the factors mentioned above), therefore, a calculation that determines an actual meaningful change is necessary.
A meaningful change in performance, however, can be determined by calculating the smallest worthwhile change (SWC) (5).
Why is the smallest worthwhile change important?
Any person involved in sport, whether that be the sports science or technical coaching staff or even the athletes themselves, should appreciate the importance of the SWC in regards to fitness testing (6). Knowledge of this change is important when conducting testing, and thus providing appropriate feedback to coaches and athletes. When relaying information, the coach must be confident that the documented change is of practical significance (5). Calculating the SWC, therefore, allows the coach to be confident that they can accurately determine if a real change in performance has occurred over a subsequent period of time, rather than just typical variation in the test.
A winning performance is a primary goal in many sporting environments, which is often distinguished by marginal differences, highlighting the importance of the SWC calculation. For instance, it allows for the identification of the smallest change possible that will increase an athlete’s chance of success. Within an individual sport, this increase in the chance of success would be based on the minimal change necessary to improve competition placement. Whereas in team sports, in which there is no clear relationship between fitness test performance and team performance, the SWC is based on standardised change in team performance (7).
How is the smallest worthwhile change calculated?
Estimation of the SWC comes from an analysis of the reliability of physical performance, and thus assuming that the data are reliable and free from systematic bias, we can analyse the SWC (8). For instance, the implementation of fitness tests that have good test-retest reliability is favourable when calculating the SWC as they produce the lowest variation between tests. This is also true for using testing equipment which provide consistent, reliable, data. For example, when measuring speed, electronic timing gates have a higher degree of accuracy and consistency in comparison to a handheld stopwatch (9).
Reducing the variation between tests increases the likelihood of identifying a real change in performance. In team sports, it has been suggested that the SWC for elite athletes can be calculated as 0.2 multiplied by the between-subject standard deviation of the particular test (7). The calculation is based on Cohen’s effect size principle, with 0.2 representing a small, but not trivial, effect size (10).
Figure 1 provides an example of 5-10-5 Shuttle test scores. For example, if an athlete runs a 5-10-5 shuttle test in 5.07 seconds, with the standard deviation for that particular test on that particular population being 0.15 seconds, the athlete would have to run 0.03 seconds faster to demonstrate a meaningful difference (SWC = 0.2 * 0.15).
This would mean that a time of 5.04 seconds, or less, would be considered a real change. A change of 0.03 seconds is rather trivial, and it is often the case that performance tests have a lot more noise (variation/error) than the observed smallest worthwhile change (7). Therefore, when assessing an individual’s measured performance change, the observed change in performance (signal) needs to be higher than the random variation (noise) associated with the test (5).
It is also important to clearly understand that the variation (noise) of a test score can be influenced by an abundance of different things, such as athlete preparation, nutrition, environmental conditions, testing equipment, testing environment etc. Therefore, it is extremely important the coach controls/standardises as many of these variables as possible if they wish to obtain worthwhile data.
Coefficient of Variation
Coefficient of variation (CV) is a statistic used to measure the degree of variation between testing trials in an individual athlete’s repeated measurements (11). In other words, the CV is a measure of the spread of data relative to the mean. In the example above (Figure 1), we can see that the CV in seconds is more than the calculated SWC. Therefore, a firm conclusion about this trivial SWC cannot be made, and in this instance, it would be more beneficial to use the CV to identify a real change. This is because, despite the SWC calculation, we must be mindful of the variation that occurs between testing trials.
When the typical error is greater than the SWC, the observed changes are often due to error of measurement rather than real change (Video 1). The measure is, therefore, considered too noisy to be useful, and as such we can only be confident of a real change if the data falls outside of the calculated variation.
A further consideration to make is that if there is variation in the first set of testing, there will also be variation in the second set of testing, and so the CV needs to be doubled (2CV) to account for the chance of error in both testing periods. For instance, if an athlete’s first performance is lower than their ‘true’ score and their second performance above their ‘true’ score, one needs to account for this extension of the standard deviation. Thus, 2CV is necessary to ensure the threshold is large enough to account for a real change (12).
Using Figure 2 as an example, targets can be set, accounting for SWC, CV, and 2CV, allowing for the identification of a trivial change, a possibly meaningful change, or a certainly meaningful change, respectively. For instance, data that falls outside of the 2CV range provides a target that would be a certainly meaningful change in performance. However, this target may be unrealistic to reach as it requires the greatest change in performance.
In contrast, the SWC provides an achievable target for athletes as it requires the smallest change in performance, but the change in performance is likely to be trivial. Therefore, there needs to be an appreciation of all three statistics when identifying performance change in athletes. Coaches can use these data to set achievable, but also meaningful targets for their athletes, in which there is consideration for not only the magnitude of change but also the degree of certainty to which this change is meaningful.
Is further research into the smallest worthwhile change needed?
A lower CV for a test implies less random noise, and therefore a greater ability or likelihood of detecting a real change in performance (12). In other words, reducing the variation between tests increases the likelihood of identifying a real change in performance, due to the ability to attribute a lower percentage of the change to chance. As a result, future research needs to focus on identifying tests that produce the lowest variation between testing trials (4).
Tests that show good test-retest reliability (i.e. high repeatability and reproducibility) are paramount within testing procedures as they produce the lowest variation between tests. Future research should also produce strict testing protocols with the aim of reducing the random error that occurs between tests (4). For example, if conducting a 20m sprint test, the athlete must follow a standardised warm-up, perform the test in the same starting position, have the same period of rest between trials, be tested in the same location, use the same testing equipment, etc.
The SWC is a favourable procedure for coaches to implement as part of the fitness testing process, as it allows for the determination of a meaningful change, rather than based on assumption. However, we must be mindful of the typical error associated with performance tests, and as such, the CV is also a necessary statistic to take into consideration when identifying performance change in athletes between two testing sessions.
Athlete targets can be set based on a combination of SWC, CV and 2CV to provide the greatest certainty in performance change.
- Reilly, T., Morris, T., & Whyte, G. (2009). The specificity of training prescription and physiological assessment: A review. Journal Of Sports Sciences, 27(6), 575-589. http://dx.doi.org/10.1080/02640410902729741
- Fernandez-Fernandez, J., Ulbricht, A., & Ferrauti, A. (2014). Fitness testing of tennis players: How valuable is it? British Journal Of Sports Medicine, 48, i22-i31. http://dx.doi.org/10.1136/bjsports-2013-093152
- Moir, G., Button, C., Glaister, M., & Stone, M. (2004). Influence of Familiarization on the Reliability of Vertical Jump and Acceleration Sprinting Performance in Physically Active Men. The Journal Of Strength And Conditioning Research, 18(2), 276. http://dx.doi.org/10.1519/r-13093.1
- Duthie, G., Pyne, D., Ross, A., Livingstone, S., & Hooper, S. (2006). The Reliability of Ten-Meter Sprint Time Using Different Starting Techniques. The Journal Of Strength And Conditioning Research, 20(2), 246. http://dx.doi.org/10.1519/r-17084.1
- Spencer, M., Fitzsimons, M., Dawson, B., Bishop, D., & Goodman, C. (2006). Reliability of a repeated-sprint test for field-hockey. Journal Of Science And Medicine In Sport, 9, 181-184. http://dx.doi.org/10.1016/j.jsams.2005.05.001
- Paton, C., & Hopkins, W. (2005). Competitive Performance of Elite Olympic-Distance Triathletes: Reliability and Smallest Worthwhile Enhancement. Sportscience,9, 1-5.
- Hopkins, W. G. (2004). How to interpret changes in an athletic performance test. Sportscience, 8, 1-7. http://www.sportsci.org/jour/04/wghtests.htm
- Turner, A., Brazier, J., Bishop, C., Chavda, S., Cree, J., & Read, P. (2015). Data Analysis for Strength and Conditioning Coaches. Strength And Conditioning Journal, 37, 76-83. http://dx.doi.org/10.1519/ssc.0000000000000113
- Hetzler, RK, Stickley, CD, Lundquist, KM, and Kimura, IF. Reliability and accuracy of handheld stopwatches compared with electronic timing in measuring sprint performance. J Strength Cond Res 22(6): 1969–1976, 2008. https://www.ncbi.nlm.nih.gov/pubmed/18978613
- Sullivan, G., & Feinn, R. (2012). Using Effect Size—or Why the P Value Is Not Enough. Journal Of Graduate Medical Education, 4(3), 279-282. http://dx.doi.org/10.4300/jgme-d-12-00156.1
- Panichkitkosolkul, W. (2013). Confidence Intervals for the Coefficient of Variation in a Normal Distribution with a Known Population Mean. Journal Of Probability And Statistics, 2013, 1-11. http://dx.doi.org/10.1155/2013/324940
- Hopkins, W. (2000). Measures of Reliability in Sports Medicine and Science. Sports Medicine, 30, 1-15. http://dx.doi.org/10.2165/00007256-200030010-00001