Predicting Medical Diagnoses from Pharmaceutical Claims Data
Horst, Cody J
MetadataShow full item record
Prescription pharmaceuticals are a vital component of personal healthcare and contribute significantly to total health expenditure in the United States. Currently, data on pharmaceutical use and expenditure generally precludes the attribution of specific pharmaceuticals to medical conditions, either because the pharmaceutical data and diagnostic data are unlinked or are linked in such a way as to preclude specific attribution. The ability to make this attribution would open up new datasets to analyses that require specific pharmaceutical-condition associations and would strengthen analyses performed using imprecisely associated data. Cost-of-illness studies in particular would benefit from better accounting of pharmaceutical claims. This work seeks to address this problem by constructing a multilabel logistic classifier and an LSTM-based recurrent neural network classifier, training them on the MarketScan© commercial claims data and considering their feasibility. Both models pick up trends in the data based on peak f1-score and individual evaluation of output, but while the logistic model is able to recreate logical associations between medical conditions, comorbidities, and the pharmaceuticals used to treat them, the recurrent neural network learns to produce the most common medical conditions. We conclude that in their current form, the models are not refined enough to be useful. However, this work was instructive in illuminating promising directions to improve the model architecture and data to better cope with noisy classification labels and aid in causal inference.
- Global health