Artificial intelligence has the ability to revolutionize human health. It is used to detect potentially cancerous lesions in medical images, to screen for eye disease, and to predict whether a patient in the intensive care unit could have a brain-damaging seizure. Even your smartwatch has AI built into it; it can estimate your heart rate and detect whether you have atrial fibrillation. But how good are these algorithms generally? The truth is, we just don’t know.
Answering this question is a nightmare. The only way to evaluate AI models — or create them in the first place — is to have a large, diverse, medical dataset. The dataset must include enough patients of all kinds to ensure the AI model behaves well across different groups of people. It must be representative of all the situations in which the model might be used, whether it is in regional hospitals or major medical centers. The dataset also has to include medical outcomes, so an AI model trying to predict these outcomes can be evaluated against the truth.
The FDA requires this kind of large-scale testing (and checks on the quality of AI model training), which means the companies that develop these technologies have access to these types of datasets. These datasets, which can come from health care providers, may include data that you produced as part of your own medical care or from clinical trials. However, these data are not accessible to other companies for building even better models, and they are certainly not available for researchers wanting to evaluate these models. Third parties have had to create their own datasets to evaluate them. And, of course, consumers are not able to make informed decisions in choosing products they may one day depend on.