paint-brush
Supervised Models for Clinical Text: Evaluating SVM and BERT Performanceby@nlp
New Story

Supervised Models for Clinical Text: Evaluating SVM and BERT Performance

tldt arrow

Too Long; Didn't Read

We performed a sentence-level classification using SVM and BERT. The entity-level annotation were converted to sentence-level.

People Mentioned

Mention Thumbnail
Mention Thumbnail
featured image - Supervised Models for Clinical Text: Evaluating SVM and BERT Performance
Natural Language Processing HackerNoon profile picture
0-item

Abstract and 1. Introduction

2 Data

2.1 Data Sources

2.2 SS and SI Categories

3 Methods

3.1 Lexicon Creation and Expansion

3.2 Annotations

3.3 System Description

4 Results

4.1 Demographics and 4.2 System Performance

5 Discussion

5.1 Limitations

6 Conclusion, Reproducibility, Funding, Acknowledgments, Author Contributions, and References


SUPPLEMENTARY

Guidelines for Annotating Social Support and Social Isolation in Clinical Notes

Other Supervised Models

OTHER SUPERVISED MODELS

We performed a sentence-level classification using SVM and BERT. The entity-level annotation were converted to sentence-level.


SVM: Initially, we used two VSM methods namely term frequency-inverse document frequency (TF-IDF) and word2vec [1] to convert the clinical notes into vectors. However, the word2vecbased system performed better than the TF-IDF-based system and we reported the performance of word2vec-based system next. We used the same embeddings that we used to identify similar words. We used Eq 1 to convert the word vectors to a single vector. The embeddings settings are as follows: vector size: 300, minimum word count: 3, and window size: 10. We used the linear kernel in the SVM classifier.



Table S6: Macro-averaged precision (P), recall (R) and F-scores (F) of SVM- and BERT-based NLP systems for coarse-categories on data from WCM only. Here, we split the data in 80:20 ratio for training and testing.

References

  1. Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representations of words and phrases and their compositionality. Advances in neural information processing systems, 26, 2013.


  2. Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. arXiv preprint arXiv:1810.04805,


  3. Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020.


  4. Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014.


Table S7: Questions for fine-tuning LLM model for different categories.


Table S8: Macro-averaged Precision (P), Recall (R), and F-scores (F) for fine and coarse-grained category classification using FLAN-T5-XL. Here, we used the instructions and did not fine-tuning the model using the examples.



Table S9: Definition for each icd code


Table S10: Erroneous examples from Rule-based NLP systems (RBS) and LLM-based NLP systems.








This paper is available on arxiv under CC BY 4.0 DEED license.

Authors:

(1) Braja Gopal Patra, Weill Cornell Medicine, New York, NY, USA and co-first authors;

(2) Lauren A. Lepow, Icahn School of Medicine at Mount Sinai, New York, NY, USA and co-first authors;

(3) Praneet Kasi Reddy Jagadeesh Kumar. Weill Cornell Medicine, New York, NY, USA;

(4) Veer Vekaria, Weill Cornell Medicine, New York, NY, USA;

(5) Mohit Manoj Sharma, Weill Cornell Medicine, New York, NY, USA;

(6) Prakash Adekkanattu, Weill Cornell Medicine, New York, NY, USA;

(7) Brian Fennessy, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(8) Gavin Hynes, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(9) Isotta Landi, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(10) Jorge A. Sanchez-Ruiz, Mayo Clinic, Rochester, MN, USA;

(11) Euijung Ryu, Mayo Clinic, Rochester, MN, USA;

(12) Joanna M. Biernacka, Mayo Clinic, Rochester, MN, USA;

(13) Girish N. Nadkarni, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(14) Ardesheer Talati, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(15) Myrna Weissman, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA and New York State Psychiatric Institute, New York, NY, USA;

(16) Mark Olfson, Columbia University Vagelos College of Physicians and Surgeons, New York, NY, USA, New York State Psychiatric Institute, New York, NY, USA, and Columbia University Irving Medical Center, New York, NY, USA;

(17) J. John Mann, Columbia University Irving Medical Center, New York, NY, USA;

(18) Alexander W. Charney, Icahn School of Medicine at Mount Sinai, New York, NY, USA;

(19) Jyotishman Pathak, Weill Cornell Medicine, New York, NY, USA.