Beck, DavidAmadu Somah, Annatu2025-08-012025-08-012025-08-012025AmaduSomah_washington_0250O_28555.pdfhttps://hdl.handle.net/1773/53454Thesis (Master's)--University of Washington, 2025Accurately predicting the aqueous solubility of organic molecules is essential in a wide range of scientific and industrial domains, including drug development, food, and energy storage. This study builds upon prior work by Panapitiya et al. by introducing a multi-stage ensemble learning framework to enhance the predictive performance of solubility models using the SOMAS dataset. The dataset comprises 11,696 molecules with diverse structural and physicochemical properties, including 2D, 3D, and quantum descriptors. Three base models, a Molecular Descriptor Model (MDM), a Graph Neural Network (GNN), and a SMILES model developed by Panapitiya et al. were utilized and evaluated using RMSE, MAE, R², and Spearman correlation. Among individual models, MDM achieved the strongest performance, but ensemble methods consistently outperformed standalone models. Simple averaging improved predictive accuracy, while Optuna-based ensemble weight optimization yielded the best overall results. Additionally, a Mixture of Experts (MoE) architecture was implemented to dynamically weight model outputs based on structural input features, demonstrating strong performance and scalability. This work highlights the value of combining diverse molecular representations and advanced ensemble techniques, providing a robust, adaptive framework for high-accuracy solubility prediction and future data-driven molecular design.application/pdfen-USnoneMachine LearningSolubilityChemical engineeringChemical engineeringSolvation Meta PredictorThesis