Methods for Etiology Extraction and Evaluation: Syntactic Patterns, ChatGPT, and Reference Sources
Etiologies, or the causes of symptoms, play a crucial role in medical diagnosis and treatment. Identifying accurate etiologies can help healthcare professionals provide better care to their patients. In recent years, there has been an increasing interest in using natural language processing (NLP) techniques to automatically extract etiologies from medical text. In this blog post, we briefly describe two suggested methods for etiology extraction and evaluate their results.
The first method we explore involves using syntactic patterns and iterative bootstrapping. This approach includes three main stages: bootstrapping, extraction, and mention unification. In the bootstrapping stage, a dedicated user interface component allows pattern developers to identify syntactic extraction templates based on a few result examples. The patterns are then applied to extract etiology mentions, which are unified into groups of synonymous mentions. In our evaluation, we found that this method had a high recall but lower precision compared to the second method.
The second method we propose utilizes generative models, specifically ChatGPT. Generative models like ChatGPT have the ability to generate a list of symptom etiologies but may suffer from hallucinations. To address this, we developed a fact verification pipeline with an evidence ranking component to verify the generated etiologies and provide provenance information. Our evaluation showed that this method had higher precision but lower recall compared to the syntactic patterns approach.
To evaluate the two methods, we used a comprehensive evaluation with reference sources for three symptoms: hiccups, jaundice, and chest pain. We compared the etiologies identified by the patterns, GPT, and reference sources, and found that the patterns had better overall coverage of etiologies. However, combining the patterns and GPT extractions yielded the highest recall and F-score. We also conducted a random sampling evaluation to assess the precision of the two approaches across a larger number of symptoms, finding high precision for both methods.
In our qualitative analysis, we examined the missed etiologies and incorrectly identified etiologies by each method. The patterns approach missed some etiologies due to limitations in publicly accessible information and incomplete pattern coverage. On the other hand, GPT produced some incorrect etiologies related to correct causes but not accurately pinpointed. These findings provide insights into the strengths and limitations of both methods for etiology extraction.
In conclusion, the combination of syntactic patterns and generative models shows promise for extracting etiologies from medical text. By leveraging the strengths of both approaches, we can achieve higher recall and precision in identifying accurate etiologies. Continued research and development in NLP techniques for etiology extraction will further enhance medical diagnosis and treatment processes.