Symbolic Regression Competition

Organiser: Arthur Kordon, Dow Chemical (USA)

 

The objective is to generate a robust symbolic regression model that relates an expensive but noisy lab data of a chemical composition (output) to 57 cheap process measurements, such as temperatures, pressures, and flows (inputs). The selected equation has to include the most sensitive inputs relative to the output, i.e. some form of variables selection is recommended. If accepted by process engineers, the proposed symbolic regression solution could be implemented in a chemical process monitoring system.

The case is based on data from a real industrial application at Dow Chemical.

 

Competition Winners

  1. Gabriel Kronberger, Upper Austria University of Applied Science
  2. Riccardo Poli and Mathew Salvaris, School of Computer Science and Electronic Engineering, University of Essex, UK
  3. Vic Ciesielski and Gayan Wjesinghe, School of Computer Science and Information Technology, RMIT University, Australia

 

The ranking, done by practicioners at Dow Chemical, was unanimous. As was emphasized in the rules, "Each model will be scored by statistical performance on test data, its simplicity, and interpretability". The last two criteria are of special importance in manufacturing applications in order to "sell" the model to the final user. One of the reasons  that have convinced the potential users is that the winner first did variable selection.

We would like to thank all participants in this pioneering competition and to congratulate the winner.

 

 

Data Set Description

The data set includes 57 measurements of process variables, which are potentially related to the composition. However, not all of them are highly correlated to the output. The data file includes two worksheets with training and test data. The input columns are named x1 to x57 and the output column is named as y. The training worksheet includes 747 training data points and the test data sheet includes 319 test data points.

 

Download the Excel file containing the data (1MB).

Rules

Entrants must submit a report with a description of the model development process, including: GP parameters used, variable selection method used, symbolic regression expression build in the Excel spreadsheet with the data, and key statistical performance metrics, such as R2 and RMSE on training and test data. Submissions will be reviewed by a committee of industrial experts. Each model will be scored by statistical performance on test data, its simplicity, and interpretability.

 

Send your entries to the following address:

on or before the deadline. Please remember to indicate "SymReg Competition" in the subject of your email.

 

Important dates

Submission deadline EXTENDED: March 31, 2010

EvoStar Conference: April 7 – 9, 2010

 

EvoStar logo

Last Updated on Wednesday, 14 April 2010 17:14
 
Copyright © 2012 Complex Adaptive Systems Group at ITI. All Rights Reserved.
Joomla! is Free Software released under the GNU/GPL License.