Laboratory Assistant

This page was developed to show the possibility of applying the SymTree algorithm for symbolic regression with low computational cost. This tool can be used as an assistant to teaching laboratories in which measurement experiments are carried out to verify the visualizations seen in the classroom. The technology created was written in JavaScript language and, therefore, runs the algorithm in the user's own browser. Try it!

1 - Enter the data

The input data can be typed or loaded through a .csv file. All input lines must first contain the input variables and then the measurement variable: x1, x2, ..., y. Optionally, the first line can contain the labels of the variables (click on the corresponding box to activate this option);

2 - Choose and run one algorithm

Choose the algorithm to be executed. The algorithms by default present settings for good performance in browsers, but it is possible to modify their parameters by clicking on the gear in the execution and results area;

3 - Analyze the result

After completing the processing, the site will display the results of the symbolic regression, which can be manipulated. For a better understanding, the page study notes provides an explanation of the developed tool.

Data input

The data can be entered manually, or you can upload a .csv file containing your data. If desired, the first line can be used to label the variables. On this website, there is a more elaborate page about the input data, with examples to test the algorithm. The website will keep the last data set that was successfully loaded until the end of the section.

Manual data entry
Local file upload

Execution and results

SymTree (Symbolic Tree)


IT-LS (Interaction-Transformation Local Search)


IT-ES (Interaction-Transformation Evolution Strategies)

With the data loaded, choose one of the algorithms below. It is also possible to adjust the parameters of the algorithm by clicking on the above gear (this can improve the execution time, at the cost of performing a simpler search. The first parameter is the one that most impacts the execution time, and a suggestion is indicated).

Were the results obtained satisfactory?

Symbolic regression is an difficult task, as it has a very large number of parameters to adjust - the free coefficients, but also the form of the function itself. In our case, it may be that: the real equation cannot be represented, the data is very noisy, or the amount of data needed is greater than that used. Even so, the algorithm is able to present good results in several scenarios.

To qualify the expressions finded by the algorithm, we use the score . The score is a classification for the expression, according to how well this expression fits the input data, and ranges from 0 to 1. The score is calculated by:

$$\small{ score = \frac{1}{(1 + MAE)} }$$

Where the MAE (mean absolute error) is equal to the sum of the differences between the values of the dependent variable yi and the values predicted by using the expression for Xi:

$$\small{ Mean Absolute Error\ (MAE) = \sum_{i=1}^{n}\frac{|y_{i} - \widehat{f}(X_{i})|}{n} }$$

Where n is the size of the database used.

SymTree Algorithm

This algorithm starts its search from a solution representing a linear regression. At each iteration he applies the operations of interaction between variables and transformation, generating incrementally more complex functions. More details can be seen in the original paper .

IT-LS Algorithm

The algorithm works by creating an initial population of random expressions and selecting the best one among them. After that, it performs a local search in the equation, changing either the non-linear functions or the exponents of the equation, repeating the process until there is no modification that improves the equation score even more. This algorithm runs up to a maximum of 50 iterations.

IT-ES Algorithm

The algorithm works by creating a population of random expressions, and then executes an Evolutionary Strategy \(\small ES - (\mu, \lambda) \) algorithm. This algorithm was performed only with mutation, with constrains for \(\small \mu = 150 \), \(\small \lambda = 50 \), and 150 iterations. A more elaborate version of this algorithm was later formalized, and the paper presenting the new proposal for the algorithm can be found here .

Symbolic Regression aims to find the generating function of a sample database. It differs from other types of regression by not restricting the search to a fixed form as in linear, polynomial and neural network regression methods. Normally, Symbolic Regression is done through Evolutionary Algorithms (eg, Genetic Programming), but these tend to be computationally costly, even in small-scale databases.

This project aims to show the potential of the SymTree algorithm for small dimensional data, applying it to known functions of Physics, Mathematics and Engineering.

This project was developed by Guilherme Aldeia as part of his scientific initiation guided by Prof. Dr. Fabricio Olivetti de França at the Federal University of ABC.

Contact

Mentor:
Prof. Dr. Fabrício Olivetti de França
folivetti@ufabc.edu.br

Mentee:
Guilherme Seidyo Imai Aldeia
guilherme.aldeia@ufabc.edu.br

Universidade Federal do ABC
Avenida dos Estados, 5001 - Bairro Santa Terezinha, Santo André - CEP: 09210-580
+55 11 4996-0001

2020 Guilherme Seidyo Imai Aldeia