SOLID: A large-scale semi-supervised dataset for offensive language identification
- Sara Rosenthal
- Pepa Atanasova
- et al.
 
- 2021
- ACL-IJCNLP 2021
Sara Rosenthal is a Staff Research Scientist in the Multilingual NLP group most recently focusing on Retrieval Augmented Generation. She is also currently the ACL Champion.
She is a co-organizer of the SemEval Conference: https://semeval.github.io/SemEval2024/, https://semeval.github.io/SemEval2025/. She is an Action Editor for TACL and has served as AC in several conferences such as ACL, NAACL, LREC-COLING, as well SAC in NAACL 2022 and on the D&I committee in ACL 2021.
She was a co-organizer of the 2021 SemEval Task: Statement Verification and Evidence Finding with Tables, the popular SemEval Task, 'OffensEval: Identifying and Categorizing Offensive Language in Social Media': https://sites.google.com/site/offensevalsharedtask/ in 2019 and 2020 and the popular SemEval Task, 'Sentiment Analysis in Twitter' that ran from 2013-2017.
She was previously a research staff member in the Healthcare and Life Sciences group working on Question Answering in the medical domain.
Prior to joining IBM, her interests focused in applying Natural Language Processing to study Sociolinguistics in Social Media. She received her PhD in July 2015 from Columbia University under the advisement of Kathleen McKeown. Her thesis was titled 'Detecting Influencers in Social Media Discussions'. This work included predicting demographics, opinion, agreement, persuasion, claims, and influencers in weblogs, micro-blogs, and discussion forums.
She previously worked on GALE, a Question/Answering System involving Machine Translation where she maintained the pipeline and various services. During her undergraduate degree she did research in using Rule Based Expert Systems for an Intelligent Tutor System at Ramapo College with Dr. Amruth Kumar