Semi-supervised learning over heterogeneous information networks by ensemble of meta-graph guided random walks
Abstract
Heterogeneous information network (HIN) is a general representation of many real world data. The difference between HIN and traditional homogeneous network is that the nodes and edges in HIN are with types. In many applications, we need to consider the types to make the decision more semantically meaningful. For annotationexpensive applications, a natural way is to consider semi-supervised learning over HIN. In this paper, we present a semi-supervised learning algorithm constrained by the types of HINs. We first decompose the original HIN schema into several semantically meaningful meta-graphs consisting of entity and relation types. Then random walk is performed to propagate the labels from labeled data to unlabeled data guided by the meta-graphs. After receiving labels from the results of random walk guided by meta-graphs, we carefully compare different ensemble algorithms to generate the final label with respect to all the clues from each metagraphs. Experimental results on two knowledge based text classification datasets show that our algorithm outperforms traditional semi-supervised learning algorithms for HINs.