Talk

A Discrete Diffusion Model for de novo Molecular Generation with Continuous Property Guidance

Abstract

Recent progress in generative modeling has significantly advanced de novo molecular design, with diffusion models emerging as a powerful tool for learning complex data distributions. These models have shown strong performance in continuous domains such as molecular conformation generation where noise can be smoothly injected and de-noised. However, applying diffusion to inherently discrete molecular representations, such as SMILES strings or molecular graphs, remains challenging. While discrete diffusion has shown strong performance in domains like natural language and code generation, approaches still rely on continuous relaxations and latent embeddings — techniques that can introduce challenges in preserving syntactic validity and chemical fidelity when applied to discrete molecular representations. In this work, we introduce a discrete diffusion model that operates directly in token space, enabling native handling of molecular syntax and topology without relaxation to continuous latent spaces.

We extend this modeling framework for property-guided generation by conditioning the reverse diffusion process on target values of continuous molecular properties without requiring post hoc filtering. We achieve this via a learned, differentiable guidance mechanism that steers sampling trajectories toward regions of chemical space consistent with desired property profiles. This allows for precise modulation of outputs across a continuous property spectrum, while preserving the structural validity and diversity of the generated molecules.

Our method is benchmarked and demonstrates significant improvements over baseline latent variable and autoregressive models in both unconditional generation fidelity and property-constrained generation accuracy. Our results suggest that discrete diffusion models, when coupled with property-guided conditioning, provide a unified and tractable approach to de novo molecular design with tunable control over complex molecular attributes.

Related