To the Editor,
I would like to add some reflections about Steele’s paper published in 2012.1 I could notice two main points in that paper. One of these is about the scientific language to communicate conclusions to policy makers and the other one is about the statistics decision theory. These problems are consequence of R. Fisher’s tradition to test the null hypothesis through the arbitrary cut with α in 0.05, 0.01 or 0.001.
One good example about the problem with scientific language translation happened during the O.J. Simpson trial. A scientist called to give evidence about blood samples declared that there was one chance in 10 millions (0.5% of population) that O.J. was not the murderer. This specialist must be told there were 2.5 millions chance in one that O.J. was the real murderer. Therefore, Simpson’s defensor convinced the jury members there was one chance that O.J. was not guilty, then the jury decided upon acquital.2
As for statistics decision, the failure in rejecting the null hypothesis, when it is not true, is an important error that can be controlled by β. Moreover, in some sciences, like in climate chanes, it should be morally more important to test alternative hypothesis than to do the statistical decision based on the null hypothesis tests. This is an intuitive conclusion supported by the probable worse consequences to conclude the world temperature will not rise if it is false (type II error). The power of statistical test is estimated by 1-β, and we can call this effect of size (ES). Apart from this problem, several scientific journal reviewers confounded the significance of the statistic tests with the experimental significance (or sig=1-0.95n. comparisons). Thereby, they have been approving for publication papers with low experimental power. For instance, Lancha Jr et al,3 concluded that it is possible to increase the speed of rats’ asparte-malate shuttle through aspartate, asparagine and carnitine supplementation. But they used t-test to compare 48 averages and; because of this, the study’s α got worse from 0.05 to 0.708. Moreover, ES has a intrinsic value judgment to test alternative hypothesis. Like Pearson’s correlation coefficient, that we can tell if it is perfect, excellent, good or reasonable, the ES is low (or ≤0.2), moderate (≈0.5) and high (or ≥0.8).4 That could help the scientist use a more comprehensible language to policy makers and does a little value judgment without exposing himself using no statistical or scientific language. Of course this strategy cannot solve the problem, but this can reduce the bad consequences. Like Cohen advised, the right statistical decision is supported by the α significance criterion, the sample size, the population effect size, and the power of the test.4