Performance evaluation of a natural language processing tool to extract infectious disease problems
Abstract
Use of a complete problem list can benefit patient care, quality improvement initiatives, and research activities. However, it can be time consuming for physicians to enter the correct encoded problem from a standardized terminology. I evaluated Discern nCode, the natural language processing (NLP) system embedded in Cerner Powerchart at Harborview Medical Center (HMC), for its utility to add Infectious Diseases (ID) problems to the electronic medical record problem list, in comparison with the usual practice of physicians adding problems unaided by NLP. 74 ID consultation notes were annotated by human experts to create gold standard problem lists. NLP-extracted problems and problem list entries were recorded for each note. Recall, precision and f-measure were calculated for nCode and the problem list, and an error analysis was performed to describe false positives and missed concepts. Discern nCode's recall was .65 and precision was .14. Problem list recall was .10 and precision was .43. Many false negatives resulted from partial matches between NLP-extracted and reference standard problems. The majority of false positives were due to inclusion of past medical problems and non-ID problems; nearly 20% of false positives should not have been extracted. Discern nCode had significantly higher recall for ID problems than the problem list. Recommendations are provided for increasing system sensitivity and recall. Overall, nCode could be a useful facilitator of problem entry and result in higher problem list completeness, but recall should be increased.