Aggregated Catalyst Physicochemical Descriptor-Driven Machine Learning for Catalyst Optimization: Insights into Oxidative-Coupling-of-Methane Dynamics and C2 Yields
Abstract
This study focuses on optimizing C2 yields in the oxidative coupling of methane (OCM), a pivotal process for sustainable chemical production. By harnessing advanced machine learning (ML) techniques, this research aimed to predict C2 yields and identify the factors that drive catalytic performance. The Extra Trees Regressor emerged as the most effective model after a comprehensive evaluation across multiple datasets and methodologies. Key to the method was the use of an innovative Aggregated Catalyst Physicochemical Descriptor (ACPD) and stratified cross-validation, which effectively addressed feature complexity and target skewness. Hyperparameter optimization using Modified Sequential Model-Based Optimization (SMBO) further enhanced the model’s performance, achieving optimized R2 values of 61.7%, 75.9%, and 92.0% for datasets A, B, and C, respectively, with corresponding reductions in the Mean Squared Error (MSE) and Root Mean Squared Error (RMSE). Additionally, SHAP (SHapley Additive exPlanations) analysis provided a detailed understanding of the model’s decision-making process, revealing the relative importance of individual features and their contributions to the predictive outcomes. This research not only achieved state-of-the-art predictive accuracy, but also deepened our understanding of the underlying chemical dynamics, offering practical guidance for catalyst design and operational optimization. These findings mark a significant advancement in catalysis, paving the way for future innovations in sustainable chemical manufacturing.