Deep learning framework to predict and generate new fluorescent molecules from experimental data
Citation
Share
Abstract
Fluorescent molecules play important roles in biological imaging, diagnostics, and materials science. However, identifying efficient and effective fluorophores remains challenging, as traditional trial-and-error experimentation and in silico computations are both costly and time-consuming. To address this, this thesis presents a deep learn- ing approach to streamline the discovery process by predicting optical properties and generating novel fluorescent molecules directly from experimental data. The study is based on FluoDB, a publicly available dataset collected from the literature, containing over 55,000 fluorophore–solvent pairs with experimentally measured optical prop- erties. Graph Convolutional Network (GCN) models were trained to predict four key optical properties and effec- tively captured complex structure–property relationships, achieving R² values ranging from 0.49 to 0.87 across the different targets. A Conditional Variational Autoencoder (CVAE) was also implemented to generate novel fluores- cent molecules based on solvent identity and target absorption range. In total, 2573 valid and structurally diverse molecules were generated, with a variety of predicted optical behaviors. Together, the predictive model and genera- tive models provide a useful and data-driven approach to accelerate exploration and design of functional fluorescent materials.
Description
https://orcid.org/0000-0003-0455-5401