Publication
ICDAR 1995
Conference paper

A clustering method and radius tuning by end users

View publication

Abstract

In this paper we describe a top-down clustering method consisting of an intra class step and an inter class step. In the intra class step all the samples for each category are initially divided into a small number of clusters, then the largest cluster is split and its members reallocated. The largest cluster is decided based on a new concept, »Volume» of a cluster that is a hybrid of existing two common criteria for splitting: Number of members in a cluster, and variance of a cluster. In the inter class step recognition is done for all the training set to assign best radius to each prototype. The radii are used as a normalizing factor in the computation of distance metrics. In our experiments we generated a prototype library by clustering characters written by Americans. When we used another training set written by Japanese only for tuning radii of the American library, the recognition rate of Japanese test set increased from 87.9% to 92.1%. The radii can be tuned even by OCR end users when the application domain is quite different from that of the initial clustering by OCR developers.

Date

Publication

ICDAR 1995

Authors

Topics

Share