BACKGROUND: Real-world evidence (RWE) is increasingly used for medical regulatory decisions, yet concerns persist regarding its reproducibility and hence validity. Variation in results across RWE studies may stem from differences in foundational characteristics between real-world data sources (RWDS). Such diversity of RWDS creates challenges and opportunities for generating valid, reproducible RWE. However, a framework to address diversity of RWDS is lacking. This study, led by the Database SIG, addresses the reproducibility challenges associated with data diversity across RWDS in pharmacoepidemiologic studies.
OBJECTIVE: To identify and characterize practices, recommendations and tools for reporting diversity across RWDS and explore how leveraging diversity could improve the quality of RWE.
METHODS: A scoping review was conducted. Keywords for the literature search and a selection tool were designed using a set of reference documents identified by a panel of experts from the Database SIG. A systematic search was conducted up to December 2021. Documents were screened, first based on titles and abstracts, then on full texts, using the selection tool. Information on topics related to collecting and reporting RWDS diversity, as well as challenges and opportunities stemming from RWDS diversity, was extracted from the included literature. A content analysis was conducted using the extracted text to identify common themes for the topics.
RESULTS: From 91 selected documents, 9 dimensions to describe RWDS were identified (i.e., organization accessing the data source, data originator, prompt for record creation, inclusion of population, content, data dictionary, time span, healthcare system and culture, and data quality), and 3 related topics were examined: tools to summarize such dimensions, challenges and opportunities arising from RWDS diversity. Content analysis further identified 36 themes within the 9 dimensions. For example, within the dimension ‘inclusion of population’, three themes were identified: i) a list of qualitative reasons for persons entering and exiting the data source; ii) cases where the dates of entry or exit from the RWDS were not available in the data, with consequences; and iii) cases where such dates were available, with consequences. Opportunities arising from data diversity included multiple imputation across data sources and standardization.
CONCLUSIONS: This study successfully identified dimensions characterizing diverse data sources used to generate RWE, facilitating a better understanding and interpretation of the results. These findings provide a framework for formal guidance, complementing concurrent initiatives from ISPE, such as HARPER and STaRT-RWE.