How We Collected Lexicons for Fine-Grained Categories of SS and SI Using an Iterative Method

Written by nlp | Published 2025/04/01
Tech Story Tags: social-support-extraction | social-isolation-detection | nlp-in-psychiatry | electronic-health-records | clinical-nlp-applications | sdoh-in-psychiatry | psychiatric-notes-mining | psychiatric-encounter-notes

TLDRWe collected lexicons for fine-grained categories of SS and SI using an iterative method that included manual chart reviews and semi-automatic methods.via the TL;DR App

Table of Links

Abstract and 1. Introduction

2 Data

2.1 Data Sources

2.2 SS and SI Categories

3 Methods

3.1 Lexicon Creation and Expansion

3.2 Annotations

3.3 System Description

4 Results

4.1 Demographics and 4.2 System Performance

5 Discussion

5.1 Limitations

6 Conclusion, Reproducibility, Funding, Acknowledgments, Author Contributions, and References

SUPPLEMENTARY

Guidelines for Annotating Social Support and Social Isolation in Clinical Notes

Other Supervised Models

3 METHODS

3.1 Lexicon Creation and Expansion

The computational approaches to any NLP tasks require annotated lexicons and gold standard data [2]. We collected lexicons for fine-grained categories of SS and SI using an iterative method that included manual chart reviews and semi-automatic methods.

3.1.1 Manual Chart Review

Zhu et al. [26] developed a lexicon for identifying SI from clinical notes of patients with prostate cancer in the context of recovery support. Initially, this lexicon, which included 24 terms, was selected; however, it yielded relatively fewer clinical notes at MSHS and WCM compared to the published report. A list of terms for each category was created and extensively reviewed by the study team which included clinical psychiatrists and psychologists. We manually reviewed 50 notes at each site to find SS and SI keywords to enrich the existing lexicons.

3.1.2 Semi-automatic Method

The lexicons from manual chart review as above were enhanced using word embeddings. First, the manually generated lexicons were vectorized using word2vec [39] and Equation 1.

This paper is available on arxiv under CC BY 4.0 DEED license.


ā€  https://radimrehurek.com/gensim/

Authors:

(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;

(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;

(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;

(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;

(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;

(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;

(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(10) Jorge A. Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;

(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;

(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;

(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;

(17) J. John Mann, Columbia University Irving Medical Center, New York, NY, USA;

(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.


Written by nlp | Natural Language Processing. I am Processing Natural Language, naturally. We publish trending research and blogs.
Published by HackerNoon on 2025/04/01