Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios

Paulo Rodrigo Cavalin; Claudio Santos Pinhanez

LREC-COLING 2024

Conference paper

20 May 2024

Theoretical and Empirical Advantages of Dense-Vector to One-Hot Encoding of Intent Classes in Open-World Scenarios

Abstract

This work explores the intrinsic limitations of the popular one-hot encoding method in classification of intents when detection of out-of-scope (OOS) inputs is required. Although recent work has shown that there can be significant improvements in OOS detection when the intent classes are represented as dense-vectors based on domain-specific knowledge, we argue in this paper that such gains are more likely due to advantages of the much richer topologies that can be created with dense vectors compared to the equidistant class representation assumed by one-hot encodings. We start by demonstrating how dense-vector encodings are able to create OOS spaces with much richer topologies. Then, we show empirically, using four standard intent classification datasets, that knowledge-free, randomly generated dense-vector encodings of intent classes can yield over 20% gains over one-hot encodings, producing better systems for open-world classification tasks, mostly from improvements in OOS detection.

Paper