Reinforcement learning-guided de novo drug design: A comparative study of RL algorithms for small molecule generation

Mpofu, Kelvin TThwala, Nomcebo LThobakgale, Setumo LMthunzi-Kufa, P2026-04-152026-04-152025979-8-3315-5647-1DOI: 10.1109/IMITEC67386.2025.11410458http://hdl.handle.net/10204/14786We present a comparative study on the application of reinforcement learning (RL) algorithms for de novo drug design. Using a custom molecular environment, we benchmarked five RL methods, DQN, PPO, A2C, REINFORCE, and DoubleDQN, for their ability to generate small, drug-like molecules from atomic building blocks. The models were evaluated based on chemical validity, drug-likeness (QED), molecular complexity, compliance with Lipinski’s Rule of Five, and structural similarity to known pharmaceuticals such as Aspirin and Ibuprofen. Among the tested algorithms, REINFORCE and PPO outperformed others by generating chemically diverse and pharmacologically relevant compounds, achieving the highest QED scores and producing molecules with complex ring structures and higher scaffold novelty. All models successfully generated fully Lipinski compliant molecules, demonstrating their utility in producing viable drug candidates. This work offers insights into the performance dynamics of RL models in chemical space and provides a foundation for developing AI-driven pipelines for accelerated drug discovery. This study highlights the benchmarking gap in RL-based molecule generation and systematically evaluates five algorithms under identical conditions to identify strengths and trade-offs.AbstractenReinforcement learningDe Novo Drug DesignMolecular generationDrug-likenessQED ScoreLipinski RuleDeep learningArtificial Intelligence in Drug DiscoveryReinforcement learning-guided de novo drug design: A comparative study of RL algorithms for small molecule generationConference PresentationN/A