About cookies on this site Our websites require some cookies to function properly (required). In addition, other cookies may be used with your consent to analyze site usage, improve the user experience and for advertising. For more information, please review your options. By visiting our website, you agree to our processing of information as described in IBM’sprivacy statement. To provide a smooth navigation, your cookie preferences will be shared across the IBM web domains listed here.
Publication
IJCNN 2020
Conference paper
Creating Corpora for Seq2Seq Tone Rephrasing Using Social Media Posts
Abstract
We present a methodology to use Twitter posts to create a parallel corpus which can be used to train Seq2Seq neural networks for a tone rephrasing task. Given that people tend to post texts expressing opinions or emotions of varied intensities regarding given real-world events, the main idea is to create corpus containing pairs of posts with opposite tone but about the same topic. By doing so we overcome the main limitation of current tone rephrasing methods: the lack of appropriate parallel training corpora. We explore different methods to create the datasets, including some which require some level of manual labelling. The results show that a completely automatic generation from Twitter data yields training datasets which are better than those with manual interventions, and good enough for Seq2Seq models to outperform non-Seq2Seq models trained with similar data.