Publication
Information Systems
Paper
Analysis of collisions when hashing by division
Abstract
In this paper simple formulas are derived for the calculation of overflow records when the division method is used for hashing keys in the following distributions: 1. (1) keys in a file are randomly distributed; 2. (2) some keys in a file form clusters of various lengths and others are randomly distributed. Using these formulas it is shown that the division method indeed produces less collisions then the theoretically perfect randomization method as indicated in some previous experimental results. In addition the paper also shows that the number of collisions depends mainly on the ratio of number of keys in a file to the number of slots available to store these keys. Some numerical examples are given. © 1975.