BACKGROUND: Electronic medical records (EMR) have become a standard data source for epidemiological, outcomes, and health services research. However, there are challenges caused by the size and complexity of EMR data. Data are collected continuously across multiple systems and are stored in a variety of structures. Systems and structures can include free text, long or wide forms, and complex temporal information. These complexities make EMR data similar to an evolving ecosystem rather than a static source found in most studies. In a natural ecosystem, data sources are assessed to ensure that information is consistent with expectations. Institutions should approach EMR data in a similar manner to provide insight and to build confidence among team members with diverse backgrounds. New tools and processes need to be developed that support assessment of analytic decisions and are available to all members of the team.
METHODS: This paper proposes visual tools to use in exploratory analyses before variable derivation. These tools are designed to promote discussion and build consensus between team members using EMR data. They allow examination of individual patient records and trends across time so common operational considerations (e.g., defining variables via multiple features, selection of time windows) are addressed using both the data and therapeutic expertise. This paper will present SAS® graphic language templates for patient profiles, cumulative heat maps, and Sankey diagrams with example discussions and decisions that each visual is designed to support.
RESULTS/CONCLUSIONS: Studies seeking to maximize use of EMR data involve multiple stakeholders that need to understand nuances in the data. Visualizations can facilitate team discussions and improve the process of feature extraction, variable construction, and project planning. These visuals can be particularly useful for sequential analysis, treatment patterns, and defining episodes of care, but all studies using EMR can benefit from the use of visuals.