STUDY OBJECTIVE: Develop and evaluate a model that uses health administrative data to categorize cardiovascular (CV) cause of death (COD).
DESIGN: Population-based cohort.
SETTING: Ontario, Canada.
PARTICIPANTS: Decedents ≥ 40 years with known COD between 2008 and 2015 in the CANHEART cohort, split into derivation (2008 to 2012; n = 363,778) and validation (2013 to 2015; n = 239,672) cohorts.
MAIN OUTCOME MEASURES: Model performance. COD was categorized as CV or non-CV with ICD-10 codes as the gold standard. We developed a logistic regression model that uses routinely collected healthcare administrative to categorize CV versus non-CV COD. We assessed model discrimination and calibration in the validation cohort.
RESULTS: The strongest predictors for CV COD were history of stroke, history of myocardial infarction, history of heart failure, and CV hospitalization one month before death. In the validation cohort, the c-statistic was 0.80, the sensitivity 0.75 (95 % CI 0.74 to 0.75) and the specificity 0.71 (95 % CI 0.70 to 0.71). In the primary prevention validation sub-cohort, the c-statistic was 0.81, the sensitivity 0.71 (95 % CI 0.70 to 0.71) and the specificity 0.75 (95 % CI 0.75 to 0.75) while in the secondary prevention sub-cohort the c-statistic was 0.74, the sensitivity 0.81 (95 % CI 0.81 to 0.82) and the specificity 0.54 (95 % CI 0.53 to 0.54),
CONCLUSION: Modelling approaches using health administrative data show potential in categorizing CV COD, though further work is necessary before this approach is employed in clinical studies.