About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
COLING 2020
Short paper
Scalable Cross-lingual Treebank Synthesis for Improved Production Dependency Parsers
Abstract
We present scalable dependency treebank synthesis techniques that exploit advances in language representation modeling which leverage vast amounts of unlabeled general-purpose multilingual text. We introduce a data augmentation technique that uses the synthetic treebanks to improve production-grade parsers. The synthetic treebanks are generated using a state-of-the-art biaffine parser enhanced with Transformer pretrained models, such as Multilingual BERT (M-BERT). The new parser improves LAS by up to two points on seven languages. Trend line results of LAS as the augmented treebank size scales surpasses performance of production models trained on originally annotated Universal Dependency (UD) treebanks.