Towards an Open Format for Scalable System Telemetry
Teryl Taylor, Frederico Araujo, et al.
Big Data 2020
In this work, we address the NER problem by splitting it into two logical sub-tasks: (1) Span Detection which simply extracts mention spans of entities, irrespective of entity type; (2) Span Classification which classifies the spans into their entity types. Further, we formulate both sub-tasks as question-answering (QA) problems and produce two leaner models which can be optimized separately for each sub-task. Experiments with four cross-domain datasets demonstrate that this two-step approach is both effective and time efficient. Our system, SplitNER outperforms baselines on OntoNotes5.0, WNUT17 and a cybersecurity dataset and gives on-par performance on BioNLP13CG. In all cases, it achieves a significant reduction in training time compared to its QA baseline counterpart. The effectiveness of our system stems from fine-tuning the BERT model twice, separately for span detection and classification. The source code can be found at github.com/c3sr/split-ner.
Teryl Taylor, Frederico Araujo, et al.
Big Data 2020
Hammad Ayyubi, Rahul Lokesh, et al.
ACL 2023
Chengkun Wei, Shouling Ji, et al.
IEEE TIFS
Ehud Aharoni, Nir Drucker, et al.
CSCML 2023