Study: EHR data for research often incomplete, inaccurate, unreliable

Current methodologies for using electronic health records for research are inadequate and result in "significant bias" when used "naively," according to an article in the Journal of the Medical Informatics Association.

At present, EHR research involves an approach that first involves phenotyping/feature extraction, which transforms raw EHR data into clinical relevant features. Those feature are used for research tasks. According to the article's authors, however, EHR data currently is incomplete, inaccurate, variable, and highly complex, rendering such research unreliable.

The authors, from Columbia University, suggest that the research process be improved, particularly by improving the current phenotyping process to make it more accurate and data driven. They recommend that researchers take a "radical shift in approach" and study EHRs themselves, not just the data, to see how EHRs are used and how data is recorded.     

"We must mine the EHR data to learn the idiosyncrasies of the healthcare process and their effects on the recording process," the authors state.

The authors also recommend, not only that EHRs be studied to better understand how to use the data in research, but that the improved understanding be "fed back" to improve EHRs.

"[B]etter understanding of missing data, inaccuracies, and biases could lead to improved user interfaces, data definitions, and even workflows," the authors say.

One of the most promising uses for EHRs is the technology's potential to advance research for public health and quality of care. There has been significant interest in developing programs to better extract, de-identify and fine-tune the data found in EHRs to maximize their use in research.

To learn more:
- here's the JAMIA article's abstract