## Key Ideas
> [!abstract] Core Concepts
>
> - **Study Type Hierarchy**: Randomised controlled trials provide strongest evidence, while theoretical work offers limited practical value
> - **Replication Crisis**: Many influential studies cannot be reproduced, undermining trustworthiness of published research
> - **Measurement Matters**: Student outcomes trump engagement measures and survey responses for determining effectiveness
## Definition
**Research Trustworthiness**: The reliability and validity of educational research findings, determined by methodology, replication success, sample relevance, and measurement quality.
## Connected To
[[Replication Crisis]] | [[Logical Fallacies]] | [[Explicit Teaching]] | [[Non-Explicit Teaching]]
---
## Research quality hierarchy
Not all research provides equally trustworthy evidence (Slavin, 2008). This hierarchy prevents giving equal weight to a case study and a large-scale randomised controlled trial.
Randomised controlled trials (RCTs) provide the strongest evidence through the scientific method, controlled variables, and reduced bias. However, findings from clinical settings may not transfer directly to classroom environments. Meta-analyses can provide similarly strong evidence when well-designed, but their quality depends on inclusion criteria and how variables are defined across studies. As one researcher noted, "A meta-analysis is like trying to determine the age of a dog by weighing it": combining disparate studies can produce meaningless results if inclusion criteria are too broad, if dissimilar studies are inappropriately combined, or if the authors' conclusions do not match the data synthesis.
Studies comparing alternative interventions provide moderate evidence by testing new approaches against established methods. These comparisons can be biased if researchers are not equally trained in both methods. Studies comparing new interventions to no intervention at all provide weaker evidence, as any intervention typically outperforms doing nothing, leading to inflated positive results.
Ethnographic case studies offer in-depth contextual understanding but remain subjective and do not generalise broadly. Theoretical work, whilst providing philosophical analysis, offers limited practical value without outcome data.
## Sample relevance
A study with impeccable methodology tells you nothing if the participants bear no resemblance to your students. When assessing research, examine whether participants come from advantaged or disadvantaged backgrounds, what age group was studied (primary, secondary, or university), and whether the sample matches your teaching context. Cultural and socioeconomic factors affect how findings transfer from one setting to another.
Studies involving students with severely limited working memory or impulse control often provide insights applicable to all students. Interventions effective for the most challenging learners often work well universally (Sweller et al., 2019). If an approach works for students with significant cognitive limitations, it likely works for everyone. Findings from different educational systems, languages, and cultural contexts may not transfer directly, so examine whether socioeconomic parallels exist between the study and your application context.
## Measurement quality
What researchers choose to measure reveals what they value and whether their findings matter for student learning. Student learning outcomes provide the most reliable evidence when measured through well-designed assessments. Long-term impact studies show whether effects persist over time, though these require longitudinal research designs. Special education outcomes apply to mainstream contexts and are often overlooked.
Survey responses provide subjective insights but do not indicate learning. [[Engagement]] measures have limited value as primary outcomes. Student engagement is not a reliable proxy for learning outcomes. Students can be highly engaged in activities that produce minimal learning, whilst effective learning sometimes occurs during less engaging but more rigorous instruction. Entertainment and education, though not mutually exclusive, are not synonymous.
## Replication crisis
The replication crisis challenges the assumption that published research, even from prestigious journals, represents reliable truth (Ioannidis, 2005; Open Science Collaboration, 2015). Begley and Ellis (2012) attempted to reproduce 53 cancer studies from top journals and successfully replicated only 6 (11% success rate). One scientist admitted conducting experiments six times and reporting only the favourable result. The Open Science Collaboration (2015) found that only 36% of psychological studies could be successfully replicated, showing the widespread nature of this problem across social sciences.
Several systemic issues mean that even well-intentioned researchers operate within incentive structures that undermine research quality. Publish-or-perish culture incentivises new discoveries over verification, affecting social sciences and educational psychology. Methodological issues such as poor design, insufficient statistical power, and p-hacking produce unreliable results, with p-values often hovering close to the 0.05 threshold. Lack of transparency prevents independent verification when original data and methods are inaccessible. Career advancement pressure discourages replication studies, leaving negative results unpublished. These problems are more prominent in social sciences, making educational research vulnerable.
Before implementing any educational intervention based on research, verify that the findings have survived replication attempts. Ask whether results have been independently replicated, whether multiple research teams reach similar conclusions, what the track record of replication is in this research area, and whether contradictory findings remain unresolved.
## Author and citation analysis
Who researchers cite and how they position their work reveals much about the reliability of their conclusions. When authors primarily cite their own studies, findings may lack independent verification. Heavy reliance on theoretical papers rather than empirical research suggests ideological bias rather than evidence-based conclusions. Failure to acknowledge or address conflicting research findings raises concerns about selective reporting.
Trustworthy research acknowledges the broader evidence base honestly, including findings that complicate the author's preferred narrative. Look for citations from multiple independent research teams, references to studies with actual student outcome data, acknowledgment of limitations and contradictory findings, and clear description of research methods and limitations.
## Common research misconceptions
These misconceptions lead educators to implement ineffective practices based on flawed interpretation of research evidence. Assuming prestigious journals guarantee quality ignores the reality that even top journals publish studies that fail replication. Confusing correlation with causation leads to incorrect inferences; look for RCTs that can demonstrate causal relationships. Making major changes based on one study is premature; require multiple independent studies showing consistent results. Prioritising student enjoyment over learning outcomes misjudges what matters; focus on studies measuring actual academic achievement.
## The principle of converging evidence
Science progresses through converging evidence rather than single crucial experiments (Stanovich & Stanovich, 2003). Issues are most often decided when the community of scientists gradually agrees that the preponderance of evidence supports one alternative theory over another. Scientists evaluate data from dozens of experiments, each containing some flaws but providing part of the answer.
Imagine five different theoretical summaries of a given phenomenon exist at one time and are investigated in a series of experiments. If one set of experiments strongly tests theories A, B, and C, with data largely refuting A and B whilst supporting C, and another set of experiments strongly tests theories C, D, and E, with data largely refuting D and E whilst supporting C, we have strong converging evidence for theory C. Not only do we have data supporting theory C, but we have data contradicting its major competitors. Although no one experiment tests all theories, taken together, the entire set allows strong inference.
The pattern of flaws running through research literature matters. If findings from different experiments are largely consistent in supporting a particular conclusion, examine the extent and nature of the flaws in these studies. If all experiments are flawed in a similar way, this circumstance undermines confidence in the conclusions because consistency in outcomes may simply result from a particular, consistent flaw. If all experiments are flawed in different ways, confidence in the conclusions increases because it is less likely that consistency in results was due to a contaminating factor that confounded all experiments.
### Meta-analysis
The combining of evidence from disparate studies to form a conclusion is increasingly done formally through meta-analysis (Cooper & Hedges, 1994; Swanson, 1999). This statistical technique has been used extensively to establish whether various medical practices are research-based. In a medical context, meta-analysis involves adding together data from many clinical trials to create a single pool of data large enough to eliminate much of the statistical uncertainty that plagues individual trials. Clear findings can emerge from a group of studies whose findings are scattered all over the map (Plotkin, 1996).
The use of meta-analysis for determining research validation of educational practices is the same as in medicine. Effects obtained when one practice is compared against another are expressed in a common statistical metric that allows comparison across studies. The findings are then statistically amalgamated and a conclusion about differential efficacy is reached if the amalgamation process passes certain statistical criteria. In some cases, no conclusion can be drawn with confidence, and the result is inconclusive.
Meta-analysis has emerged as a way of dampening contentious disputes about conflicting studies that plague education and other behavioural sciences (Kavale & Forness, 1995; Stanovich, 2001). The method is useful for ending disputes that seem to be nothing more than a "he-said, she-said" debate. An emphasis on meta-analysis has often revealed that we actually have more stable and useful findings than is apparent from a perusal of conflicts in journals.
The National Reading Panel (2000) found this in their meta-analysis of evidence surrounding several issues in reading education. A meta-analysis of 66 comparisons from 38 different studies indicated "solid support for the conclusion that systematic phonics instruction makes a bigger contribution to children's growth in reading than alternative programmes providing unsystematic or no phonics instruction". In another section of their report, a meta-analysis of 52 studies of phonemic awareness training indicated that "teaching children to manipulate the sounds in language helps them learn to read".
## Case studies and qualitative investigations
The usefulness of case studies and qualitative investigations is strongly determined by how far scientific investigation has advanced in a particular area (Stanovich & Stanovich, 2003). The insights gained from case studies or qualitative investigations may be useful in early stages of investigation of a problem. They can help determine which variables deserve more intense study by drawing attention to previously unrecognised aspects of behaviour and by suggesting how understanding might be sharpened by incorporating the participant's perspective.
However, when investigation moves from early stages where case studies may be useful to more mature stages of theory testing where adjudicating between causal explanations is the main task, the situation changes drastically. Case studies and qualitative description are not useful at later stages of scientific investigation because they cannot be used to confirm or disconfirm a particular causal theory. They lack the comparative information necessary to rule out alternative explanations.
### Context of discovery versus context of justification
Qualitative research, case studies, and clinical observations support a context of discovery where such research must be regarded as "preliminary/exploratory, observational, hypothesis generating" (Levin & O'Donnell, 2000). They provide essential orientation in early stages of inquiry into a research topic because "one has to look before one can leap into designing interventions, making predictions, or testing hypotheses".
However, in the context of justification, variables must be measured precisely, large groups must be tested to ensure conclusions generalise and, most importantly, many variables must be controlled because alternative causal explanations must be ruled out. Despite rich insights they often provide, descriptive studies cannot be used as evidence for an intervention's efficacy. Descriptive research can only suggest innovative strategies to teach students and lay groundwork for development of such strategies (Gersten, 2001).
Researchers pursuing qualitative description sometimes slide from purely descriptive work (Objective B) into making comparative statements (Objective A) without carrying out proper types of investigation to justify them. They want to say that a certain educational programme is better than another (that is, it causes better school outcomes). They want to give educational strictures that are assumed to hold for a population of students, not just the single or few individuals who were the objects of the qualitative study. But instead of pursuing Objective A through proper experimental methods, they carry out their investigation in the manner of Objective B.
### Progression to more powerful methods
Research on a particular problem often proceeds from weaker methods to ones that allow stronger causal inferences (Stanovich & Stanovich, 2003). Interest in a particular hypothesis may originally emerge from a particular case study of unusual interest. This is the proper role for case studies: to suggest hypotheses for further study with more powerful techniques and to motivate scientists to apply more rigorous methods to a research problem.
Following case studies, researchers often undertake correlational investigations to verify whether the link between variables is real rather than the result of peculiarities of a few case studies. If correlational studies support the relationship between relevant variables, then researchers attempt experiments in which variables are manipulated in order to isolate a causal relationship between variables. Research typically progresses: case studies suggest hypotheses, correlational studies verify relationships, experiments isolate causal relationships.
## Internal and external validity
Internal validity concerns whether we can infer a causal effect for a particular variable (Shadish et al., 2002). The more a study employs the logic of a true experiment (manipulation, control, and randomisation), the more we can make a strong causal inference. External validity concerns the generalisability of the conclusion to the population and setting of interest.
Internal and external validity are often traded off across different methodologies (Stanovich & Stanovich, 2003). Experimental laboratory investigations are high in internal validity but may not fully address concerns about external validity. Field classroom investigations are often high in external validity but, because of logistical difficulties involved in carrying them out, they are often quite low in internal validity. This is why we need to look for convergence of results, not just consistency from one method.
Convergence increases confidence in the external and internal validity of conclusions. This underscores why correlational studies can contribute to knowledge. First, some variables simply cannot be manipulated for ethical reasons (such as human malnutrition or physical disabilities). Other variables, such as birth order, sex, and age, are inherently correlational because they cannot be manipulated. Finally, logistical difficulties in classroom and curriculum research often make it impossible to achieve the logic of the true experiment.
Complex correlational techniques such as multiple regression, path analysis, and structural equation modelling allow for partial control of third variables when those variables can be measured. These statistics allow us to recalculate the correlation between two variables after the influence of other variables is removed. If a potential third variable can be measured, complex correlational statistics can help determine whether that third variable is determining the relationship. These correlational statistics and designs help to rule out certain causal hypotheses, even if they cannot demonstrate the true causal relation definitively.
## References
Begley, C. G., & Ellis, L. M. (2012). Raise standards for preclinical cancer research. *Nature*, 483(7391), 531-533. https://doi.org/10.1038/483531a
Cooper, H., & Hedges, L. V. (Eds.). (1994). *The handbook of research synthesis*. Russell Sage Foundation.
Gersten, R. (2001). Sorting out the roles of research in the improvement of practice. *Learning Disabilities: Research & Practice*, 16(1), 45-50.
Ioannidis, J. P. A. (2005). Why most published research findings are false. *PLoS Medicine*, 2(8), e124. https://doi.org/10.1371/journal.pmed.0020124
Kavale, K. A., & Forness, S. R. (1995). *The nature of learning disabilities: Critical elements of diagnosis and classification*. Lawrence Erlbaum Associates.
Levin, J. R., & O'Donnell, A. M. (2000). What to do about educational research's credibility gaps? *Issues in Education: Contributions from Educational Psychology*, 5, 1-87.
National Reading Panel. (2000). *Teaching children to read: An evidence-based assessment of the scientific research literature on reading and its implications for reading instruction*. National Institute of Child Health and Human Development.
Open Science Collaboration. (2015). Estimating the reproducibility of psychological science. *Science*, 349(6251), aac4716. https://doi.org/10.1126/science.aac4716
Plotkin, D. (1996, June). Good news and bad news about breast cancer. *Atlantic Monthly*, 53-82.
Shadish, W. R., Cook, T. D., & Campbell, D. T. (2002). *Experimental and quasi-experimental designs for generalised causal inference*. Houghton Mifflin.
Slavin, R. E. (2008). What works? Issues in synthesizing educational program evaluations. *Educational Researcher*, 37(1), 5-14. https://doi.org/10.3102/0013189X08314117
Stanovich, K. E. (2001). *How to think straight about psychology* (6th ed.). Allyn & Bacon.
Stanovich, P. J., & Stanovich, K. E. (2003). *Using research and reason in education: How teachers can use scientifically based research to make curricular and instructional decisions*. National Institute for Literacy.
Swanson, H. L. (1999). *Interventions for students with learning disabilities: A meta-analysis of treatment outcomes*. Guilford Press.
Sweller, J., van Merriënboer, J. J. G., & Paas, F. (2019). Cognitive architecture and instructional design: 20 years later. *Educational Psychology Review*, 31(2), 261-292. https://doi.org/10.1007/s10648-019-09465-5