Since I started delving into medical and epidemiological studies, armed only with my communication analytical background and a basic equipment of argumentation theory, statistics, logic, philosophy of science, etc., I have often been surprised by the scientific ways of thinking I encountered in studies. I initially thought I could learn something from this, but I too often encountered internal inconsistencies, implicit assumptions, incorrect conclusions; actually mainly argumentation-theoretical matters. If you have developed an antenna for this, you will experience it as counter-intuitive: therefore worth checking.
I saw scientific jargon used as a shortcut to circumvent logical thought processes. Especially in domains involving many billions of dollars, legitimate critical questions are covered up with mantras that everyone has forgotten where they came from and what they were intended for. There seems to be sloppy thinking, people rely on each other's confirmation when it suits them (without even addressing conflicting data), people hide behind complicated multifactorial corrections while raw data seems to point to completely opposite conclusions, etc. etc. It soon turned out that it was not just my ignorance.1It's all over the place sometimes, see Het-academic bankruptcy in four companies This also became clear to me when I expressed my doubts - sometimes even dismay - to specialists (including academics) in the relevant fields.
Then it is encouraging if a recognized scientific leader turns out to have seen it much earlier. So those doubts are not that idiotic.
Kenneth J. Rothman already listed it in 2014. If only the science-worshiping dogmatists and science popularizers took that to heart... but I fear that Max Planck will be right: that will certainly take another generation. And then he was still optimistic.
At the same time, I also see why this approach causes problems: only people who can think sharply will still be able to practice science in this way. Our scientific world, especially the large institutes, is partly filled with docile middle-class people, and they don't just go away. They even go further than researchers who have other priorities than being a team player, maintaining a reputation or raking in funds with policy-affirming proposals. Just think about what an institute can do more with. Undermining or substantiation of the consensus that has been built up? Advancing science can easily be seen as institution-threatening. And the more fundamental, the more anti-institutional.
With such a stricter quality standard, persistent hornets, using jargon, consensus, friendly experts and Legacy Science™ adepts, also fall by the wayside. You must be able to make your own judgments, be able (and dare to) make informed estimates, question your own certainties, abstract, etc. and that is not for everyone. That is certainly not 'inclusive' because brain discrimination is of course not possible. (s) Simpletons should also be able to participate! Especially if they are underrepresented in science, positive discrimination is only fair! Only together can we catch up!(/s)
This certainly does not mean that everyone who shouts something should always be taken seriously. The guidelines for scientific integrity are a good guideline for this and can also be used as a touchstone. Unfortunately, those guidelines have fallen into disuse; our leading institutes ignore them (take the 'transparency' core guideline for example, which actually makes the other core guidelines verifiable)2See also the article from June 2021 about scientific integrity and still retain their scientific status. These guidelines will therefore be adjusted in due course so that everything is correct again...
In this way, they can continue to make the six fundamental mistakes that Rothman identified in 2014, where he also provided stunning examples. Concise and clear, hence the full translation in the hope that his argument will once again be dusted off.
Over Kenneth J. Rothman
Kenneth J. Rothman is an American epidemiologist known worldwide as one of the founders of modern epidemiological methodology. He was born in 1945 and is best known for his influential work on epidemiological causality, bias and research methods. Rothman received his doctorate (Sc.D.) in epidemiology from the Harvard School of Public Health.
Attitude towards mainstream medical dogmas:
Rothman repeatedly emphasizes that science should not degenerate into a belief structure. He advocates transparency, methodological purity, and a continued critical attitude towards institutional interests – including pharmaceutical influence in clinical research.
This is clearly evident in his work statistical significance does not equal scientific truth – a message that is more urgent today than ever.
Important work:
He wrote the standard work “Modern Epidemiology” — originally written by him alone, later in collaboration with Sander Greenland and Timothy Lash. This book is considered the 'Bible' of empirical epidemiology.
Scientific contributions:
- He introduced and systematized the concept of confounding and interaction (effect modification).
- He emphasized the importance of causal diagrams (predecessors of DAGs – Directed Acyclic Graphs).
- He pointed out the limitations of p-values and the danger of statistical ritualization in medicine.
- He was also critical of over-reliance on “significance limits” in research, which can lead to misleading conclusions.
Institutes and journals:
He is the founder of the magazine Epidemiology (launched in 1990), which aimed to provide an alternative to overly conservative, institutionally controlled medical publications.
Below is the translation of his article3The original piece is here https://doi.org/10.1007/s11606-013-2755-z.
Summary
Scientific knowledge changes rapidly, but the concepts and methods for conducting research change more slowly. To encourage discussion about outdated ways of thinking about conducting research, I list six misconceptions about research that persist long after their shortcomings have become apparent.
The misconceptions are:
- There is a hierarchy of research designs; randomized trials offer the greatest validity, followed by cohort studies, while case-control studies are the least reliable.
- An essential element for a valid generalization is that the subjects constitute a representative sample of a target population.
- If a term indicating the product of two factors in a regression model is not statistically significant, there is no biological interaction between those factors.
- When categorizing a continuous variable, a reasonable scheme for choosing categorical cutoffs is to use percentile-defined cutoffs, such as quartiles or quintiles of the distribution.
- One should always report P values or confidence intervals corrected for multiple comparisons.
- Significance research is useful and important for data interpretation.
These misconceptions have persisted in magazines, classrooms, and textbooks. They persist because they are intellectual shortcuts that avoid a more thoughtful approach to research problems. I hope that highlighting these misconceptions will spark the necessary discussions to bury these outdated ideas for good.
Kenneth J. Rothman, DrPH
Research Triangle Institute, Research Triangle Park, NC, VS; Boston University School of Public Health, Boston, MA, VS.
KEYWORDS: research design; data interpretation; epidemiological methods; representativeness; evaluation of interaction; multiple comparisons; percentile limits; statistical significance tests.
PMID:24452418 | PMCID:PMC4061362 | DOI:10.1007/s11606-013-2755-z
© The Author(s) 2014. This article is published open access on Springerlink.com
There are still a surprising number of misconceptions about conducting research with human subjects. Some misconceptions persist despite the fact that the opposite is taught, and others because of the fact that the opposite should be taught. To stimulate discussion about these issues, here I list six persistent misconceptions about research and provide a brief summary of the problems each of these misconceptions poses.
Misconception 1. There is a hierarchy of study designs: randomized trials offer the greatest validity, followed by cohort studies, while case-control studies are the least reliable.
Randomized trials are often considered the “gold standard” among study types, but are not perfect even in theory. Furthermore, the assumption that the comparative validity of research results can be inferred from the type of study is incorrect.
Although some believe that evidence from a randomized trial is as compelling as logical evidence, no empirical study can provide absolute certainty. If randomized trials were perfect, how could they produce divergent results? In fact, they are subject to several errors1. Of course, there is random error, as would be expected in a study based on random assignment. But there are also systematic errors, or distortions. For example, randomized trials are usually analyzed using the intention to treat principle, comparing the groups initially randomly assigned regardless of any subsequent non-compliance. Non-compliance leads to an underestimate of the effect of the treatment. This bias is usually considered acceptable because it is offset by the benefits of random assignment. However, underestimation of effects is not acceptable in a safety study aimed at uncovering side effects of treatment. Another important source of bias in a randomized trial is errors in outcome assessment, such as undercounting outcome events. Even if randomization at the beginning of the study ensures a balance between the groups' risk factors, the study groups may become increasingly unbalanced over long-term follow-up due to differential dropout or changes in the distribution of risk factors. In long-term studies, the benefits of random assignment may therefore diminish over time.
In short, studies are far from perfect. Furthermore, both cohort and case-control studies yield valid results when properly designed and conducted. It is therefore incorrect to without thinking give more validity to a study based on a hierarchy of research designs2,3. For example, the association between cigarette smoking and lung cancer is well established based on findings from cohort and case-control studies. This connection has never been clearly demonstrated in a randomized study. It is not easy to randomly assign people to a smoking or non-smoking group, but when smoking cessation was examined as part of a multifaceted intervention in the Randomized Multiple Risk Factor Intervention Trial4, those who were encouraged to quit smoking actually developed more lung cancer than those who were not encouraged to quit. The results of the study did not reverse the findings of the many cohort and case-control studies conducted without randomization. Rather, the discrepancy was attributed to problems with the study.
In another striking example, the results of large cohort studies pointed out5,6 indicated that the risk of coronary heart disease was reduced in postmenopausal hormone users, but later results from two randomized trials indicated no association or an increased risk.7,8The reaction in the scientific community and the popular press9was to discredit the results of the cohort studies, on the assumption that they had been refuted by the randomized studies. Many continue to adhere to that interpretation, but in an elegant reanalysis, Hernan et al.10indicated that the study groups in the cohort studies and the randomized studies were different and that the effects of hormone use after menopause varied greatly depending on age and time since menopause. When the studies were limited to new hormone users, Hernan et al. showed that differences in the distribution of age and time since menopause could explain all apparent discrepancies. Although it is common to attribute such discrepancies to inherent weaknesses of the non-experimental studies, it is simplistic to assign validity based on an assumed hierarchy of study types.11
Likewise, discrepancies between cohort studies and case-control studies should not be superficially explained away by an assumed validity advantage of cohort studies over case-control studies. Well-designed case-control studies will yield the same results as well-designed cohort studies. When conflicts arise, they may arise from problems in either or both types of studies. Although case-control studies have long been dismissed as outdated versions of cohort studies, which start from the disease and look back at possible causes, epidemiologists now view case-control studies as conceptually identical to cohort studies, except for an efficiency gain that comes from sampling the denominators rather than conducting a full count. Indeed, these efficiency gains allow more resources to be devoted to exposure assessment or case validation in case-control studies, resulting in less bias than in corresponding cohort studies of the same relationship.
Those who view case-control studies as outdated versions of cohort studies sometimes make the false analogy that the controls must be very similar to the cases, except that they do not have the disease that defines the case. In fact, the control group in a case-control study is intended to be a sample of the population denominator that gives rise to the cases, a substitute for the full denominators obtained in a cohort study. The control group must therefore resemble the entire research group, and not the cases.12,13 When well designed, case-control studies can achieve the same excellent validity as well-designed cohort studies, while a poorly designed study can be unreliable. The type of study should not be used as a measure of the validity of a study.
Misconception 2. An essential element for making valid generalizations from a study is that the subjects are a representative sample of a target population.
This misconception is related to the view that scientific generalization involves a mechanical extrapolation of results from a sample to the source population. But that describes statistical generalization; scientific generalization is slightly different: it is the process of establishing a correct statement about the way nature works.
Scientific generalization is the ultimate goal of scientific research, but a prerequisite for this is designing a study with internal validity, which is enhanced by holding all confounding variables constant. When have we ever heard of animal researchers looking for a statistically representative sample of animals? Instead, their approach is almost the opposite of the pursuit of representativeness. For example, biologists studying mice prefer mice that are homogeneous in terms of genes and environment, and differ only in terms of the experimentally manipulated variable. Unlike the statistical generalization of polls or sampling, which requires only an extrapolation from the sample to the source population, scientific generalization is through educated guesses, but only from the safe platform of a valid survey. Consequently, studies are stronger if they limit the variability of confounding factors, rather than aiming for representativeness. Doll and Hill14studied mortality among male British doctors in relation to their smoking habits. Their findings were considered broadly generalizable, despite the fact that their study group was not representative of the general population of tobacco users in terms of gender, race, ethnicity, social class, nationality, and many other variables.
When there is a legitimate question about whether an overall association varies by subgroup of a third variable, such as age or ethnic group, it may be necessary to include people from a wide range of values of that third variable, but even then it is counterproductive if the study group is representative of the source population for that variable. The goal in that case would be to include subjects evenly distributed across the range, or in a distribution that increases the overall efficiency of the study. A sample that is representative of the source population will be suboptimal.15,16
Misconception 3. If a term indicating the product of two factors in a regression model is not statistically significant, there is no biological interaction between those factors.
The term “biological” here should be understood in a broad sense to include biochemical, psychological, behavioral, and physical interactions. The problem is that interaction is usually evaluated using regression models, in which the product term refers to statistical interaction rather than biological interaction.
Biological interaction refers to two or more causes acting in the same mechanism, with effects that are interdependent. It describes a natural state. When base effects are measured as changes in disease risk, a synergistic (i.e., positive) biological interaction occurs when the joint effect of two causal factors is greater than the sum of their separate effects.17Statistical interaction, on the other hand, does not describe nature, but a mathematical model. This is usually assessed with a product term for two variables in a regression model. Its magnitude depends on the choice of measures and the measurement scale. Statistical interaction only implies that the basic functional form of a specific mathematical model is not an appropriate description of the relationship between variables. Two factors that interact biologically may or may not interact statistically depending on the model used.
Product terms in regression models have units that are difficult to interpret. If one variable is fat consumption, measured in grams per day, and another is the number of pack-years of cigarettes smoked, how to interpret a variable that has the unit gram/day multiplied by pack-years? The challenge of interpreting such product term coefficients has led to a focus on the p-value associated with the coefficient, rather than the size of the coefficient itself. Focusing on the p-value, or on whether the coefficient of a product term is statistically significant, only worsens the problem of confusing statistical interaction with biological interaction (see misconception 6). A more meaningful assessment of interaction would be to focus on the percentage of cases of a disease that can be attributed to biological interaction.17,18
Take a simple example from the TREAT (Trial to Reduce Cardiovascular Events with Aranesp Therapy) study19), which evaluated the risk of stroke in 4,038 patients with diabetes mellitus, chronic kidney disease, and anemia who were randomly assigned to a group receiving darbepoetin alfa or a group receiving a placebo. In patients with no history of stroke, the risk of stroke during the study period was 2% in patients receiving placebo and 4% in patients receiving darbepoetin alfa. In patients with a history of stroke, the corresponding risks were 4% and 12%. The authors noted that the risk increase was greater for darbepoetin alfa in patients with a history of stroke, but they rejected this interaction because the product term was not statistically significant in a logistic regression model. The increased risk attributable to darbepoeitin alfa was 2% in patients without a history of stroke and 8% in patients with a history of stroke, indicating a strong biological interaction between darbepoeitin alfa and a history of stroke. If the risks were purely additive, the risk in patients with both risk factors would be 6%, instead of the actual 12%. Half of the risk in patients with both risk factors appears to be attributable to biological interaction, despite the authors' claim that there was no interaction.
Misconception 4. When categorizing a continuous variable, a reasonable scheme for choosing category boundaries is to use percentile-defined boundaries, such as quartiles or quintiles of the distribution.
There are two reasons why using percentiles is a poor method for choosing category boundaries. First, these boundaries may not correspond to the parts of the distribution where biologically significant changes occur. Suppose you are conducting a study on vitamin C intake and the risk of scurvy in the United States. If you were to divide vitamin C intake into five groups, you would see that the entire association between vitamin C consumption and scurvy was limited to the lowest group, and within that group to only a small proportion of people who had exceptionally low vitamin C intake. 10 mg of vitamin C per day can prevent scurvy, but those who take less than that represent only a fraction of 1% of the population in the United States.20Using percentile-based categories would make it impossible to determine the effect of inadequate vitamin C intake on scurvy risk, because all intake above 10 mg/day is essentially equivalent. If we routinely use percentile cut points, we may not know if we are facing the same problem as when researching vitamin C and scurvy. A more effective alternative would be to start with many narrow categories and merge adjacent categories until significant differences in risk become apparent.
The second problem with percentile-based categories is that it is difficult to compare results between studies because categories between studies that use percentile category boundaries are unlikely to match. This problem can be avoided by expressing cut-off points in the natural units of the variable (such as mg/d for vitamin C intake). It is also useful to report means or medians within categories.
Misconception 5. One should always report P values or confidence intervals corrected for multiple comparisons.
Traditional adjustments for multiple comparisons involve inflating the P-value, or the width of a confidence interval, based on the number of comparisons performed. When analyzing biological data that is full of real relationships, the premise for traditional adaptations is shaky and the adaptations are difficult to defend. The concern about multiple comparisons stems from the fear of finding falsely significant findings (type I errors in the jargon of statistics). In Misconception 6, we discuss the problems associated with using statistical significance tests for data analysis. But before we look at those issues, let's consider the reasons for adjusting reported results for multiple comparisons.
Despite the fact that a single test of significance is intended to have a 5% chance (at the conventionally used level) of being significant when the null hypothesis is true, and therefore multiple tests, if performed correctly, should each have this property, there is concern that performing multiple tests increases the chance of a false positive result. Of course, as the number of tests increases, the chance that one or more tests will be false positive also increases, but that is only because there are many tests being performed. Adjustments for multiple comparisons reduce these types of Type I errors, but at the cost of increasing Type II errors, which are nonsignificant test results in the presence of a real relationship. When observed relationships are all the result of chance, type I errors can occur, but type II errors cannot occur. Conversely, when the observed relationships all reflect actual relationships, Type II errors can occur, but Type I errors do not. The context of any analysis thus has fundamental implications for the interpretation of the data. In particular, it is absurd to make adjustments that reduce Type I errors at the expense of increasing Type II errors, without an evaluation of the estimated relative costs and frequency of each type of error.
If scientists studied random numbers instead of biological data, any significant results they report would be type I errors, and adjustments for multiple comparisons would be meaningful; some skeptics believe that studies of genome-wide association scans approach this situation.21 But when scientists study biological relationships rather than random numbers, the assumption that Type I errors are the biggest problem may be incorrect.22A more rigorous evaluation of the need for multiplicity adjustments would begin with an assessment of the tenability of the claim that the data are essentially random numbers. When studying experiments on paranormal phenomena, skepticism about the results could be an argument for multiplicity adjustments. When studying the physiological effects of pharmaceutical agents, real associations are to be expected and the adaptations are more difficult to defend. Studying single nucleotide polymorphisms in relation to a particular disease could be a middle path. An approach to this issue that is more theoretically defensible is a Bayesian approach, in which ex ante credibility is assigned to different levels of relationships and adjustments are made using Bayes' theorem to calculate posterior credibility.23,24
Misconception 6. Significance research is useful and important for data interpretation.
Significance research has led to far more misunderstandings and misinterpretations than clarity in interpreting research results.25–28A significance test is a deteriorated version of the P-value, a statistical quantity that combines precision and effect size, confusing two essential aspects of data interpretation. Measuring effect size and precision as separate tasks is a more direct and clear approach to data interpretation.
For studies that aim to measure relationships and conclude whether these reflect causal relationships, the focus on the magnitude of these relationships should be paramount: estimating effects is clearly preferable to statistical testing. Ideally, a study estimates the size of the effect and analyzes the possible errors that could have distorted the effect. Systematic errors, such as distortion by measured factors, can be addressed with analytical methods; other systematic errors, such as the effects of measurement error or selection bias, can be addressed with sensitivity analyzes (also called bias analysis). Random errors are typically expressed in confidence intervals, which provide a range of parameter values that are consistent with the data up to a certain level.
It is unfortunate that a confidence interval, from which both an estimate of effect size and measurement accuracy can be derived, is typically only used to assess whether it contains the null value or not, thus turning it into a significance test. Significance testing is a poor classification system for research results; strong effects may be incorrectly interpreted as null findings because authors incorrectly interpret the lack of statistical significance as a lack of effect, or weak effects may be incorrectly interpreted as important because they are statistically significant. Rather than being used as surrogate tests of significance, confidence intervals should be interpreted as quantitative measures that indicate the magnitude of the effect size and the degree of precision, with little attention paid to the precise location of the boundaries of the confidence interval. This advice is supported by the Uniform Requirements for Manuscripts Submitted to Biomedical Journals, but is nevertheless often overlooked, even by reviewers and editors of journals that support these requirements.29
Many misconceptions arise from reliance on statistical significance tests. The focus on the statistical significance of interaction terms rather than measuring interaction, as discussed above, is an example of this. The evaluation of dose-response trends by simply stating whether or not there is a significant trend, rather than reporting the magnitude and ideally the shape of that trend, is another example. Yet another example is the advice sometimes given to calculate the power of a study when reporting results, especially if those results are not statistically significant. Reporting the power of a study as part of the results is called 'post-hoc' power calculation.30Power calculations are based on a hypothesis about the level of association to be distinguished from a null association, but when the research results are in hand, it is no longer necessary to formulate a hypothesis about the magnitude of the association because you now have an estimate of it. A confidence interval for the estimated association reflects all relevant information; a capital calculation no longer yields anything.
The unfortunate consequence of the focus on statistical significance testing is that it has created a dichotomous view of relationships that are better assessed in quantitative terms. This distinction is more than a subtlety. Every day there are significant, regrettable, and avoidable misinterpretations of data that result from the confusing fog of statistical significance testing. Most of these errors could be avoided if the focus were shifted from statistical testing to estimation.
Conclusion
Why do such important misconceptions about research persist? These misconceptions are largely a substitute for more thoughtful and difficult tasks. It is easier to resolve a discrepancy between a trial and a non-experimental study in favor of the trial, without performing the laborious analysis that Hernan et al.10It is easy to declare that a result is not statistically significant, wrongly suggesting that there is no evidence of a relationship, rather than looking quantitatively at the range of relationships that the data actually supports. These misconceptions are an easy road, but when that road is full of others traveling the same path, there may be little reason to question the route. Indeed, these misconceptions are often perpetuated in magazines, classrooms, and textbooks. I believe that the best chance for improvement lies in raising awareness of these issues, with reasonable debate. Max Planck once said: “A new scientific truth does not triumph by convincing its opponents and making them see the light, but because its opponents eventually die and a new generation grows up familiar with it.”31To the extent that this cynical view is correct, we can expect outdated concepts to disappear slowly at best. I hope that highlighting these misconceptions will spark needed discussions and be a catalyst for change.
Acknowledgments: I have received useful criticism from Susana Perez, Andrea Margulis, Manel Pladevall in Jordi Castellsague.
Conflict of interest: The author declares that he has no conflict of interest.
Corresponding author: Kenneth J. Rothman, DrPH; Research Triangle Institute, Research Triangle Park, NC, VS (e-mail: KRothman@rti.org).
Open Access This article is distributed under the terms of the Creative Commons Attribution License, which authorizes any use, any dissemination and any reproduction in any medium, provided the original author(s) and source are credited.
REFERENCES
- Hernán MA, Hernández-Díaz S, Robins JM. Accorded studies analyzed as observational studies. Ann Internal Med. 2013:59:560–2. doi:10.7326/0003-4819-159-8-201310150-00709
- Ioannidis JPA. Why most published research results are incorrect. PLoS With. 2005;2(8):e124.
- Hiatt WR. Observational studies on drug safety – aprotinin and it lack of transparency. N Engl J Med. 2006;355:2171–3.
- Shaten BJ, Kuller LH, Kjelsberg MO, Stamler J, Ockene JK, Cutler JA, Cohen JD. Lung cancer mortality at 16 years among MRFIT participants in intervention and usual care groups. Multiple Risk Factor Intervention Trial. Ann Epidemiol 1997;7:125–36.
- Grodstein F, Manson JE, Colditz GA, et al. A prospective, observational study of postmenopausal hormone therapy and primary prevention of cardiovascular disease. Ann Intern Med. 2000;133:933–41.
- Varas-Lorenzo C, García-Redríguez LA, Pérez Gutthhann S, et al. Hormon-reflying therapy and incidence of acute myocard infarct. Circulation. 2000;101:2572–8.
- Hulley S, Grady D, Bush T, Furberg C, Herrington D, Riggs B, Vittinghoff E. Penadized study to estrogen plus progestage of coronary prevention in postminopausal women. Heart and Estrogen/progestin Restruction Study (HERS) Research Group. JAMA. 1998;280:605–13. doi:10.1001/jama.280.7.605.
- Manson JE, Hsia J, Johnson KC, et al. Estrogen plus progestin and the risk of coronary heart disease. N Engl J Med. 2003;349:523–34.
- Taubes G. Do we really know what makes us healthy? New York Times, 16 september 2007.
- Hernán MA, Alonso A, Logan R, Grodstein F, Michels K, Willett WC, Manson JE, Robins JM. Observational studies analyzed as randomized experiments: an application to postmenopausal hormone therapy and coronary heart disease. Epidemiology. 2008;19:766–79. doi:10.1097/EDE.0b013e3181875e61
- Concato J. Observational versus experimental studies: what is the epidemiological evidence for a hierarchy? NeuroRx. 2004;1:341–7.
- Vandenbroucke JP, Pearce N. Case–control studies: basic concepts. Int J Epidemiol. 2012;41:1480–9. doi:10.1093/race/dys147.
- Rothman KJ. Chapter 5, Types of epidemiological studies, in Epidemiology, An Introduction, 2e editie. Oxford University Press, New York, 2012.
- Doll R, Hill AB. Mortality among physicians in relation to their smoking habits: a preliminary report. Br Med J 1954;ii:1451–5.
- Rothman KJ, Gallacher J, Hatch EE. Why representativeness should be avoided. Int J Epidemiol. 2013;42:1012–4. doi:10.1093/walk/dys223
- Rothman KJ, Gallacher J, Hatch EE. When it comes to scientific deductions, sometimes a cigar is just a cigar. Int J Epidemiol. 2013;42:1026–8. doi:10.1093/walk/dyt124
- Rothman KJ. Chapter 11, Measuring Interaction, in Epidemiology, An Introduction, 2e editie. Oxford University Press, New York, 2012.
- Knol MJ, van der Tweel I, Grobbee DE, Numans ME, Geerlings MI. Estimation of interaction on an additive scale between continuous determinants in a logistic regression model. Int J Epidemiol. 2007;36:1111–8.
- Skali H, Parving HH, Parfrey PS, Burdmann EA, Lewis EF, Ivanovich P, Keithi-Reddy SR, McGill JB, McMurray JJ, Singh AK, Solomon SD, Uno H, Pfeffer MA. TREAT Investigators: Stroke in patients with type 2 diabetes mellitus, chronic kidney disease, and anemia treated with darbepoetin alfa: the Reducing Cardiovascular Events with Aranesp study -therapie (TREAT). Circulation. 2011;124:2903–8.
- Recommended daily allowances for vitamin C, vitamin E, selenium and carotenoids. Institute of Medicine, The National Academies Press, Washington, D. C., 2000.
- Dudbridge F, Gusnanto A. Estimation of significance thresholds for genomic association scans. Genet Epidemiol. 2008;32:227–34.
- Rothman KJ. No adjustments are necessary for multiple comparisons. Epidemiology. 1990;1:43–6.
- Greenland S, Robins J. Empirical-Bayesian adjustments for multiple statistical comparisons are sometimes useful. Epidemiology. 1991;2:244–51.
- Greenland S, Poole C. Empirical-Bayesian and semi-Bayesian approaches to occupational and environmental risk monitoring. Arch Environ Health. 1994;48:9–16.
- Rothman KJ. A vote of confidence (editorial). N Engl J Med. 1978;299:1362–3.
- Poole C. Beyond the Confidence Interval. Am J Public Hlth. 1987;77:195–9.
- Rothman KJ. In search of significance (editorial). Ann Int Med. 1986;105:445–7.
- Gelman A, Stern H. The difference between 'significant' and 'not significant' in itself is not statistically significant. Amer Statistician. 2006;60:328–31.
- Uniform Requirements for Manuscripts Submitted to Biomedical Journals, http://www.icmje.org/manuscript_1prepare.html (consulted on 2 mei 2013)
- Smith AH, Bates MN. Confidence limit analyses should replace power calculations in the interpretation of epidemiologic studies. Epidemiology. 1992;3:449–52.
- Planck M. Scientific autobiography and other articles, Philosophical Library, New York, 1968, vert. F. Gaynor (New York, 1949), pp. 33–34
References
- 1It's all over the place sometimes, see Het-academic bankruptcy in four companies
- 2See also the article from June 2021 about scientific integrity
- 3The original piece is here https://doi.org/10.1007/s11606-013-2755-z

Nice argument about the epidemiological principles and details. TS Kuhn said in 1962 that the “bastion of science” continues to defend the prevailing paradigms until, after a long struggle against mainstream scientists, they are no longer tenable... https://www.lri.fr/~mbl/Stanford/CS477/papers/Kuhn-SSR-2ndEd.pdf
210 pages, JGM! With Niels Bohr and Bertrand Russell on the advisory committee... And I already had such a long reading list.
Then at least read the Wiki Lemma...
But his examples in his book are very juicy and worthwhile…. But I read that book back in 1975 or so…. Then I also had more time as an Electrical Engineering student...
Added to the reading list. For me it's a race to catch up.
Everywhere I look I see interesting things. As a child I looked forward to the new Donald Duck, later the PEP, now Virusvaria, and many other things. Still some intellectual growth in all those years :-).
Thanks for sharing this article Anton.
Rothman lectured at Erasmus MC for many years, was easily approachable, but a bit of a grouch (students and colleagues were a bit afraid of him). But he gave very good lectures and has written a, yes, nice, book about epidemiology: 'epidemiology an intriduction', which I can recommend to anyone reading below the line to read. It's easy to read and uses little to no jargon.
An article that is also fun to read is a satirical one by Rothman about one of the epidemiological heroes of a bygone era: John Snow. Snow discovered that cholera spreads through water and not through air (miasma). This was a major and important discovery at the time (we are talking in the mid-19th century). What Rothman has done in the piece below is as if Snow lived in the present day and wanted to investigate through a grant application whether cholera spreads through water or air. Spoiler alert: Snow does not get the subsidy and Rothman explains (satirically) why not, see:
https://pubmed.ncbi.nlm.nih.gov/26829161/
Rothman is a bit of an odd one out among epidemiologists. Originally a dentist who was interested in science. He was/is important because his great predecessor and promoter (Miettinen) wrote completely unclearly about epidemiology. Rithman has managed to translate Miettinen insightfully. Mind you, there were no epidemiologists before 1970.
Within the COVID saga, Rothman is coming off quite well. He neglected to say anything about the new disease until about mid-2021. Unfortunately (if I remember correctly) he did speak out in favor of vaccination, but that was only well into 2021 and as we all know, the pressure was high for the tall trees to conform to the delusions of that day. Let me just say, he did not become famous by portraying himself in the media as a COVID expert. He could have done it, but he didn't and that speaks for him.
Some well-known names at WUR that I held in high esteem and of which Huub Savelkoul briefly advocated vitamin D at a talk show table (I thought it was positive) during the first wave of COVID, also quickly had their good names thrown through the corona mill. The WUR has become a WEF HUB and being critical was banned in 2020. Some years earlier, a child of mine had to start a case against a professor from another university. The case was won by my child, but it was said "don't talk about it because it will not do your diploma any good because you could give the university a bad name". Then you think it was that person's fault and that it was an incident, but we now know better (worse...). But also that we are not alone in “the battle” and especially after reading the above article. Thanks again!
It may also be interesting to mention that Sander Greenland (co-author of Modern Epidemiology) was co-author of the Fraiman et al. article on the side effects of the vaccines based on the Clinical Trial data: https://research.bond.edu.au/en/publications/serious-adverse-events-of-special-interest-following-mrna-covid-1/
Not completely off-topic, I think many people can also appreciate this alto 2026 outlook to 2050.
========================
We live in turbulent times.
They could use a hopeful, positive Master Plan 2050 with a “controlled explosion” of the parasitic bubbles and excesses, so that we can then build on an honest human foundation of reliable money and real labor.
General human principles of this Master Plan 2050:
1. I think the 10 commandments of Moses are more than enough. And above all: Thou shalt not steal (and therefore not take countries from others). Then all wars will immediately end. Sovereignty and self-determination are “sacred” principles. Russia withdraws from Ukraine. Ukraine gives the Russians in the Donbas living and cultural space. Israel gives Palestinians living space according to the 1948 UN borders; and Hamas and the Palestinians recognize Israel. The US leaves Venezuela and Greenland alone. China (and the US!) leaves Taiwan and Hong Kong alone.
2. I also like to fall back on other universal principles: the general laws of nature and the laws of logic.
EU principles
3. There is no one route for Europe, because all EU countries are incredibly different in cultures and therefore in norms and values. They therefore all have their own specific problems. Fine, leave it that way; freedom, happiness.
4. That is why the EU must return to true subsidiarity. So only do collectively what is necessary or is clearly useful for everyone and leave everything else to the individual countries.
5. This means: reducing EU rules, bureaucracy and personnel by approximately 80 to 90%. About back to the ECSC. Do not try to unite what is incompatible.
6. The Euro may continue to exist. We must then strictly enforce the 3% and 60% rule. But the ECB must aim for 0% inflation so that citizens and savers/pension pots are not consistently impoverished and large capital owners and government (debt) are unjustifiably enriched. So: ECB increase deposit rates, leaving speculators and Southern Europe with their bizarre debts that live on the money of future youth; stop increasing (government) debts, etc.
7. Given the cultural difference, I suspect that this implies that there should be a Northern Euro and a Southern Euro. Unless of course they also (want to) become more sensible in Southern Europe...; then they can stay in the Euro. But we do not help them become wiser; that is their own responsibility.
Netherlands
8. For the Netherlands, a fairly simple, but unfortunately not yet universally accepted, set of logical priorities applies, in the areas where the government has a task and where the most money is spent.
VWS
9. Ensure that sensible care is only provided based on added value priority. And ensure that people live and eat healthier. Then the budget of VWS can be reduced by 50%. Waiting lists of serious mental health patients will then immediately disappear.
OCW
10. Ensure that education goes back to 50 years ago, when we were still number 1 in the international lists for language, arithmetic and mathematics. Use again the methods from about 50 years ago that the pioneer of “Error-free arithmetic in 12 weeks” also originally used. As a result, the approximately 30% staff overhead (supervisors, backpack coordinators, remedial teachers, above-school juggernauts that add no value, etc. etc.) can/must be removed in any case, and the classes can become somewhat larger if more discipline is demanded of students. Special education and technical schools are being restored. This saves approximately 40% on the education budget.
11. Ensure that the debate is opened up again in the media. They play an unprecedentedly wrong role (particularly visible since Corona) by prioritizing their own (left-wing) positions and framing them positively; and to cancel or extremely negatively reframe all other views (particularly right-wing and scientifically based criticism of government policy).
MinFin (this can only be arranged internationally if we keep the Euro)
12. Ensure that the DNB does what it should have done again: Inflation = 0% = price stability. So that property rights to savings and pension pots are guaranteed again and protected against state intervention or monetary dilution
13. And as DNB, provide a very efficient current account service with legal tender that can also be used completely anonymously, just like banknotes (i.e. an unlimited digital Euro; and not the stripped-down CBDC with which you can store a maximum of €3,000 interest-free (!). That is theft from the citizens and a kneejerk for the banking lobby.).
14. Ban banks from creating money themselves. They only lend what is saved.
15. Ensure that labor is taxed less/not at all, but that consumption and profits are taxed. This is difficult, because capital competes with foreign countries. But it has to be done.
16. The 4th industrial revolution (Digitalization and AI) must lead to deflation so that everyone benefits from this innovation and not just the upper class, the rich and the capital owners. Instead of the consciously controlled minimum 2% inflation since 1971 (abandonment of gold and dollar) [or even 1920 abandonment of the gold standard], even in times of innovation.
SZW
17. The rules of Social Assistance (PW), WW and WIA are implemented 100% in spirit, especially by UWV, municipalities, doctors, but also by companies and citizens. Then there will be at least approximately 0.5 million fewer benefits and the same number of employees compared to the current 8 million. Only those who really cannot do anything at all due to illness or disability receive a benefit without obligations. There are stiff fines and penalties for fraud or abuse. Benefits, allowances and minimum wages must be completely transformed, so that working (more) always pays off for everyone. This halves SZW's budget.
18. To promote social cohesion, immigrants are only entitled to social benefits after 10 years. So only those who can provide for themselves can stay here. There is no need for a ban on immigration, but you simply don't get anything here, so a de facto immigration stop. Those who cannot support themselves are immediately deported. Just like in Australia, the US and Canada and all “normal economic thinking” countries.
EZK & Climate (stop with the latest indication!)
19. Stop the ideological zero-CO2 ambition: it is economic harakiri compared to China, India and the US who continue to use fossil fuels and also virtually pointless (in the absence of proven and provable causality that this solves anything at all. Rob Jetten “€36 billion against 0.000037 0C warming”; if that is even true...).
20. So stop deliberately making gas much more expensive and electricity relatively cheaper; and also stop all other e-subsidies. Unless a positive business case can be made for an e-transition (that is only possible if complete seasonal storage is profitable...). Use fossil fuels where necessary and wait for nuclear fusion or use “normal” nuclear energy if that is economically feasible.
21. Stop the idiotic nitrogen discussion: nitrogen is good fertilizer; NL has no nature, only culture. And the water quality appears to be excellent for water purification. More NOx and NH4 leads to other animals. So what?
22. Stop industry bashing. It is excellent that there are environmental regulations and that they are strictly enforced. But not going further than the same Qaly standards (approx. €30,000/Qaly) that now also apply to healthcare. Then Tata Steel, for example, will not have to close for a long time, is my expectation. It is already immeasurably cleaner than in the 1970s.
THE
23. Make sure democracy works as intended again. So in coalitions there are no exclusions of (extreme) left/right, but real representativeness. For years now, approximately 75% of the population has believed that “something” should be done about immigration. But due to exclusion and sabotage of (extreme) right through center and left-wing parties, nothing happens at all. That erodes confidence in the government.
24. Stop the (largely secret) lobbying of large companies and government-subsidized NGOs on politics. Ensure that decisions are 100% transparent. The WOO must be replaced by the obligation of ministers to explicitly state ALL relevant arguments, especially against arguments, in every decision. Sensitive point, because politics thrives on shadyness, trickery and deceit... But that has to stop. This also erodes confidence in the government and fuels conspiracy theories.
25. To definitively stop the current corruption and inefficiency, the three areas of society must (gradually) be strictly separated from each other cfm. the principles of the Social Tripartite, so that they can no longer blackmail or pollute each other:
a. Legal Life (The State): The government limits itself exclusively to democratic law and security (the 10 commandments). Here everyone is 100% equal. The state stops subsidizing opinions, science or companies.
b. The Spiritual Life (Culture, Education & Science): This area must be 100% free. No state curriculum in education, no subsidized media bias and no “political science” (such as climate, nitrogen and Corona dogma). Science must again be based on free invention and natural laws, not on political desirability.
c. Economic Life: This is the domain of free cooperation and craftsmanship (engineers, farmers, entrepreneurs). Here the business case and the logic of the chain rule. Politics stops disrupting the market through ideological taxes (CO2/nitrogen) or monetary theft (inflation). Monopolies that abuse their position are of course countered by the government.
VROM
26. The housing bubble must be punctured by abolishing the mortgage interest deduction (sounds hypocritical, because mine has already been paid off). But this means that the next generation can only buy houses if their own parents, who have benefited from that bubble, help their children with the investment. That's not fair. But this will already be initiated, because it will also yield more for the state (haha...). Moreover, the taboos and idiotic rules against new construction must be incredibly relaxed. The problem is not “building”, but mainly permits to build. So here again a problem created by the government.
Defense & Sovereignty/BuZa
27. There must be an army for internal emergencies and to assist or even overrule police in case of emergency. As a small country, the army is always too small to completely defend you. So we must treat all foreign countries in a friendly and tolerant manner. We are an active member of the UN to provide maximum support for the international legal order. That is in the interest of all small countries. It should be argued that the group of small countries in the UN context also has nuclear weapons to deter large countries.
28. We have no essential raw materials in the Netherlands. We are therefore by definition dependent on foreign countries for a certain level of prosperity. The only internationally relevant assets are water, agricultural & horticultural knowledge; and some natural gas. And also our general knowledge, language skills and our trading instincts. That is why it is crucial for a country to be wise, sensible and therefore highly educated. And to use our (relatively limited) assets to trade internationally and exchange our assets for relevant raw materials and (semi-)finished products.
The other ministries
29. Furthermore, (small) improvements are needed in many areas: crime (especially less recidivism; no longer tolerating revolving door criminals but permanently locking them up or "cleaning them up"), agriculture (mega stables and factory farming may be allowed to go), public transport only if it is really cheaper than private transport, etc. What is especially necessary is to reduce the many complex and often contradictory regulations and the size of the national, provincial and municipal government.
Simple, right? A bit like Plato. A bit a la Rudolf Steiner. And a bit like Edin Mujagić
Unfortunately, this will not be realized soon. Because there are far too many vested interests. And confusing ideological positions.
But here and there people emerge with these kinds of insights.
Hopefully this will happen quickly enough before there is a war and/or a mega financial crisis (which is almost certainly due to the continued impoverishment of citizens and building debt bubbles...). Because then the new world of 2025 might well be realized.
If this is not done in time, things will unfortunately go badly.
And if the Netherlands is not repaired in time, more and more successful and creative people from the Netherlands will flee to Scandinavia, Africa or Asia...
Am I missing something? Or do you have even better ideas?
Then report them.
Wishing you a prosperous 2026 in the run-up to 2050!
I've created a completely new version here, which is more complete and consistent. Unfortunately I cannot remove or improve the old one...
Link current version:
https://docs.google.com/document/d/1JkQfUGTG3nG3hy2eJ-YZAiZCYIu-R8BK/edit?usp=sharing&ouid=113155463720193786240&rtpof=true&sd=true