Capturing Individual-level Social Determinants from Clinical Text
Abstract
Knowledge of social determinants of health (SDOH), which refer to nonmedical factors influencing health outcomes, can help providers improve patient care. However, SDOH are often documented in unstructured notes, making them more inaccessible. Although previous works have attempted SDOH extraction from clinical notes, most efforts defined SDOH more narrowly and focused on the note’s social history (SH) section, where social factors are traditionally documented. Here, we introduce a new SDOH dataset covering a broad range of SDOH content that is annotated over entire notes. We characterize what, where, and how SDOH information is documented in clinical text, present baseline systems using a token classification and generative approach, and investigate whether training only on the SH section can effectively extract SDOH from the entire note. The final dataset, consisting of 2,007 annotations covering 7 open-ended SDOH domains over 500 notes, will be publicly released to encourage further research in this area.