Technological innovations can seem relentless. In computing, some have proclaimed that “a year in machine learning is a century in any other field.” But how do you know whether those advancements are hype or reality?
Failures quickly multiply when there’s a deluge of new technology, especially when these developments haven’t been properly tested or fully understood. Even technological innovations from trusted labs and organizations sometimes result in spectacular failures. Think of IBM Watson, an AI program the company hailed as a revolutionary tool for cancer treatment in 2011. However, rather than evaluating the tool based on patient outcomes, IBM used less relevant measures – possibly even irrelevant ones, such as expert ratings rather than patient outcomes. As a result, IBM Watson not only failed to offer doctors reliable and innovative treatment recommendations, it also suggested harmful ones.
When ChatGPT was released in November 2022, interest in AI expanded rapidly across industry and in science alongside ballooning claims of its efficacy. But as the vast majority of companies are seeing their attempts at incorporating generative AI fail, questions about whether the technology does what developers promised are coming to the fore.
IBM Watson wowed on Jeopardy, but not in the clinic.
AP Photo/Seth Wenig
In a world of rapid technological change, a pressing question arises: How can people determine whether a new technological marvel genuinely works and is safe to use?
Borrowing from the language of science, this question is really about validity – that is, the soundness, trustworthiness and dependability of a claim. Validity is the ultimate verdict of whether a scientific claim accurately reflects reality. Think of it as quality control for science: It helps researchers know whether a medication really cures a disease, a health-tracking app truly improves fitness, or a model of a black hole genuinely describes how it behaves in space.
How to evaluate validity for new technologies and innovations has been unclear, in part because science has mostly focused on validating claims about the natural world.
In our work as researchers who study how to evaluate science across disciplines, we developed a framework to assess the validity of any design, be it a new technology or policy. We believe setting clear and consistent standards for validity and learning how to assess it can empower people to make informed decisions about technology – and determine whether a new technology will truly deliver on its promise.
Validity is the bedrock of knowledge
Historically, validity was primarily concerned with ensuring the precision of scientific measurements, such as whether a thermometer correctly measures temperature or a psychological test accurately assesses anxiety. Over time, it became clear that there is more than just one kind of validity.
Different scientific fields have their own…



