Natural Language Processing for Risk Assessment: Identifying SI/SS in Psychiatric Notes

tldt arrow

Too Long; Didn't Read

This study presents rule- and LLM-based NLP systems to identify fine-grained categories of SS and SI in clinical notes of psychiatric patients

People Mentioned

Mention Thumbnail
Mention Thumbnail

Coin Mentioned

Mention Thumbnail
featured image - Natural Language Processing for Risk Assessment: Identifying SI/SS in Psychiatric Notes
Natural Language Processing HackerNoon profile picture
0-item

Abstract and 1. Introduction

2 Data

2.1 Data Sources

2.2 SS and SI Categories

3 Methods

3.1 Lexicon Creation and Expansion

3.2 Annotations

3.3 System Description

4 Results

4.1 Demographics and 4.2 System Performance

5 Discussion

5.1 Limitations

6 Conclusion, Reproducibility, Funding, Acknowledgments, Author Contributions, and References


SUPPLEMENTARY

Guidelines for Annotating Social Support and Social Isolation in Clinical Notes

Other Supervised Models

5 DISCUSSION

This study presents rule- and LLM-based NLP systems to identify fine-grained categories of SS and SI in clinical notes of psychiatric patients. A primary goal of the study was to develop and validate two portable and open-source NLP systems. Given that none of the selected clinical notes were associated with ICD codes indicating SI, the development of both NLP systems enabled the identification and subcategorization of this risk factor.


Comparative accuracy performance was initially unexpected given that LLMs typically outperform RBS on related tasks [33]. Upon manual review of the results, it became evident that the rule- and LLM-based approaches solved the task in different ways, both of which could be considered valid. However, these differences are not reflected in the performance metrics. The rule-based lexicon approach appears to have been significantly better than the LLM since it is most similar to the manual annotation rule book and thus, the gold-standard annotations that are assigned as ground-truth when evaluating system performance. The rule book and the lexicons were developed together, with the goal of the lexicon-based approach to approximate the Rules as closely as possible.


Furthermore, the gold-standard annotations and the RBS assign a single label for each SI/SS occurrence, whereas the LLM system can assign multiple labels. This is a consequence of having separate fine-tuned LLMs for each of the SI and SS subcategories. Future work is warranted to improve model accuracy when adapting COT question-answering for multilabel classification tasks [32]. Another difference is that the rule book and lexicons took a conservative approach, only assigning a label if the concept was explicit, whereas the LLM was more flexible. For example, ‘She feels depressed and suicidal because she has no friends and no boyfriend’ was labeled by the RBS as well as the gold-standard annotation as no social network because having no friends is in the lexicon and rule book. In contrast, the LLM inferred both no social network and loneliness.


Comparing efficiency in the development of the RBS and LLMs, we assumed that the LLMs would require less manual input given that FLAN-T5 is “few-shot,” requiring no labeled training data. However, without fine-tuning, the model performed poorly and required synthetic examples. The iterative validation process revealed that strategic tuning examples were required to coerce the LLM to override the colloquial understanding of the categories for the task-specific definitions. Still, there were some concepts that the LLM would not unlearn during fine-tuning, even with a higher learning rate. For example, in the no instrumental support model, the synthetic example ‘doesn’t have a lot of spending money on hand to engage in the activities she would like’ continued to be labeled ‘yes’ rather than the correct ‘not relevant’ label. There were many cases where the LLM identified the presence of SI/SS correctly that were neither identified by the RBS nor present in the gold-standard annotations as dictated by the rule book. For example, ‘Lived in an Assisted Living facility for a year,’ and ‘Pt hasn’t been in touch with her family’ (see Supplemental Table S10 for more examples).


The performances of both the RBS and LLMs were relatively poor in identifying instrumental support. This is likely because the keywords often contain site-specific names and entities such as ‘HASA’ and ‘Lenox Hill Neighborhood House,’ to name a couple of the many examples. Another area where both approaches fell short was in the plan section of the psychiatric clinical note. Part of the clinical plan might be to increase social connectedness. The manual annotators were easily able to understand that context whereas both systems generated false positive labels. Further erroneous examples from the rule- and LLM-based systems are provided in Supplementary Table S10.


The RBS performed comparably at MSHS and WCM; however, the LLM performed better at WCM compared to MSHS. This is likely related to two key differences. The first is the higher frequency of SS/SI mentions at MSHS (e.g., 75.3% of notes at MSHS had a manually annotated mention of SS vs. 52.2% at WCM). The second, related difference is that, in addition to SS/SI mentions that fit the inclusion criteria for manual annotations, MSHS also has more mentions of SS/SI concepts that did not fit within the strict rule book of the manual annotations nor the lexicons of the RBS, but were identified by the LLM. This is due to the MSHS underlying corpus being from clinical care sites (such as inpatient psychiatry) with comprehensive psychiatric evaluations that systematically include SDOH information.


This work expands on the body of literature by specifically focusing on the fine-grained classification of SS and SI, a novel approach not undertaken by earlier studies. More broadly, Guevara et al. [32] utilized LLM-based classification for SS and adverse SS (SI). Their study reported the best f-scores of 0.60 (FLAN-T5-XXL) and 0.56 (FLAN-T5-XL) for SS and adverse SS, respectively, across 154 test documents. In contrast, Zhu et al. [26] deployed 24 lexicons and the Linguamatics I2E NLP tool for identifying SI and achieving an f-score of 0.93 from 194 clinical notes. Our study presents superior outcomes across two sites and with more specific (fine-grained) categories.


This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;

(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;

(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;

(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;

(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;

(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;

(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(10) Jorge A. Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;

(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;

(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;

(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;

(17) J. John Mann, Columbia University Irving Medical Center, New York, NY, USA;

(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.


Trending Topics

blockchaincryptocurrencyhackernoon-top-storyprogrammingsoftware-developmenttechnologystartuphackernoon-booksBitcoinbooks