Ensemble DeBERTa Models on USMLE Patient Notes Automatic Scoring using Note-based and Character-based approaches

Bowen Long; Fangya Tan; Mark Newman

doi:10.56028/aetr.6.1.107.2023

Authors

Bowen Long
Fangya Tan
Mark Newman

DOI:

https://doi.org/10.56028/aetr.6.1.107.2023

Keywords:

DeBERTa, DeBERTa-v3-large, LSTM, Ensemble, BERT, USMLE.

Abstract

To maximize the accuracy and efficiency of the USMLE Step 2 clinical skills examination evaluation process, we proposed an ensemble model that helps automatically score patient notes written by test takers instead of physician raters manually scoring them by appropriate features. This research used DeBERTa-base, DeBERTa-large, and DeBERTa-v3-large as three base models and ensembled them with two different approaches: Note-based and Character-based. We concluded that LSTM Note-based ensemble topped the overall performance with an F1-score of 0.81747 on the validation data, 48% higher than the F1-score of the most effective base model (DeBERTa-v3-large). Furthermore, the performance is robust when breakdown by clinical cases and folds and applied to the testing set (0.88737 accuracy). Finally, the ensemble approach to different base models (BERT-base-uncased and BERT-large-uncased) achieved a 32% F-1 score boost. We demonstrated the ensemble model has excellent potential to improve performance in general Natural Language Understanding tasks.

Ensemble DeBERTa Models on USMLE Patient Notes Automatic Scoring using Note-based and Character-based approaches

Authors

DOI:

Keywords:

Abstract

Downloads

Published

Issue

Section