Automating Domain Squatting Detection Using Representation Learning
Abstract
Registering altered domain names with the purpose of confusing users and conducting malicious activities is one of the most widespread types of attacks on the Web, conforming a family of techniques known as domain squatting. Detecting these domains is a difficult t ask, g iven t he l arge a mount of combinations and the massive and heterogeneous nature of the Web. In this work, we propose a set of models to firstly learn the distributional regularities from detected squatted domains, and from that, automatically generate realistic modified domains. Our goal is to proactively guide the generation of squatted domains towards malicious domains that exists but have not been detected yet. We conducted an empirical study for both typo-squatting and combo-squatting generation approaches against strong baselines on real world data, showing their feasibility and providing insights to support for proactive defense in the context of cloud security.