Unexported functions

The module has some built-in auxiliary functions that its external use is not recommended. All implementations are listed in the modules of the library, but only some functions are exported outside the package.

Tree structural nodes

To build the expression trees, the defined structs are used. Those serves as the backbone of the tree, where every node has a different content.

Types and functions

GP_NLS.AbstractNodeType

Abstract type of our symbolic tree nodes. The idea of creating this type is to create subtypes InternalNode and TerminalNode, and then use multiple dispatch to implement functions that should have different behavior when manipulating the expression trees.

Expression trees should not be built using Var, WeightedVar, Const, Func, ERC, but with these nodes, which are intended to be used as the "backbone" of the tree. The backbone is build using the InternalNode and TerminalNode, and its contents should be the ones declared in $NodeContent.jl$.

A terminal node must have as its contents only the types Const, Var, WeightedVar, and an internal nodemust have its contents the type Func.

Notice that the ERC, when selected to be used in a terminal during the creation of a tree, is replaced by a random Const node). There are not any explicit ERC terminal in the trees.

source
GP_NLS.InternalNodeType

Struct to build the internal nodes of the backbone of the tree. Its contents will always be of the func type, a f ::Func function that will necessarily have f.arity children, where f.arity is the arity of the function.

InternalNode(f::Func, children::Vector{AbstractNode}) <: AbstractNode
source
GP_NLS.TerminalNodeType

Struct to build the terminal nodes of the backbone of the tree. Its contents will always be of the terminal type: Union{Const, Var, WeightedVar}.

TerminalNode(terminal::Union{Const, Var, WeightedVar}) <: AbstractNode
source

Auxiliary functions

Implementation of some auxiliary functions that are used to inspect and manipulate trees more generally.

Types and functions

GP_NLS.branches_in_limitsFunction

Finds all branches of any tree node that have a number of nodes less than or equal to allowedSize and a depth less than or equal to allowedDepth . Returns a list with the position of all branches found.

branches_in_limits(
    allowedSize::Int64, allowedDepth::Int64, node::AbstractNode, _point::Int64=1)::Vector{Int64}

The _point parameter being returned is for internal use, and serves to monitor the point of the tree where the candidates were found. This is only meaningful to recursive calls, and outside of the function it does not represent any useful information.

source
GP_NLS.change_at!Method

Takes a point p of type integer (which must be less than or equal to the number of nodes of branch), a branch of type AbstractNode, and any node AbstractNode representing a tree. Returns a modification of the tree by inserting the branch into the tree at position `$p$. This function changes the tree passed as argument.

change_at!(p::Int64, branch::AbstractNode, node::AbstractNode)::AbstractNode

This method is mainly used to modify trees in the crossover operation.

source
GP_NLS.change_children!Method

Function that takes a point p (which must be less than or equal to the number of nodes of branch), a branch of type AbstractNode, and a list of children Vector{AbstractNode}. Returns a modification of the list of children, where the p position node will be replaced by the given branch. This function changes the list of children passed as argument.

change_children!(p::Int64, branch::AbstractNode, children::Vector{AbstractNode})::Vector{AbstractNode}

This function is a helper to change_at!, and itsfor internal use. It is not exported by the module.

source
GP_NLS.copy_treeMethod

Function that takes any node of a tree (AbstractNode) and recursively recreates the structure so that references are not shared between the tree passed and the tree returned, avoiding side effects when manipulating the tree.

copy_tree(node::AbstractNode)::TerminalNode

This function works by having a multiple dispatch for each subtype of AbstractNode.

source
GP_NLS.get_branch_atMethod

Function that takes any node of a tree (AbstractNode) and an integer p (which must be less than or equal to the number of nodes in tree) and returns the branch at position p.

get_branch_at(p::Int64, node::AbstractNode)::AbstractNode
source
GP_NLS.get_depth_atMethod

Takes a point p of type integer (which must be less than or equal to the number of nodes of the node passed) and any node AbstractNode representing a tree, then this function find and return the depth of the subtree at position p.

get_depth_at(p::Int64, node::AbstractNode)::Int64

It's almost like finding the depth of the tree, but when we are interested in finding the depth of a subtree that is at the point p, not the whole tree depth.

source
GP_NLS.which_childrenMethod

Function that takes a set of children of the same node as the array children::Vector{AbstractNode}, and an integer p (which must be less than or equal to number of nodes in children) and finds the child containing the p-th node of the children if it was traversed inorder.

Returns a tuple containing the child which contains the node of index p, and an integer informing the position of the p-th node within the returned (child) subtree.

which_children(p::Int64, children::Vector{AbstractNode})::Tuple{Int64, AbstractNode}
source

Population Initialization

Implementation of different functions to initialize individual trees, as well as functions to create an entire population.

Types and functions

GP_NLS.PTC2Method

Tree creation with the Probabilistic Tree Creator 2 (PTC2) method, described in Two Fast Tree-Creation Algorithms for Genetic Programming, by Sean Luke.

This method looks like Koza's full method, but in addition to respecting a limit of maxDepth depth, it also respects a limit of number of nodes expctdSize.

PTC2 ensures that the depth does not exceed the maximum (in our case, weighted variables count as depth 1), and ensures that the number of nodes does not exceed the expected value added to the highest arity between the functions, that is, $expctdSize + max(arity(f)), f in fSet$.

PTC2(
    fSet::Vector{Func},
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}},
    maxDepth::Int64,
    expctdSize::Int64)::AbstractNode

Here we adopt that the chance to select a $t$ terminal will be uniform for all possible terminals, and the chance to select a function will also follow the same logic.

source
GP_NLS._create_random_terminalMethod

Function that receives the set of terminal contents and creates a random terminal node.

_create_random_terminal(
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}})::TerminalNode

Creating a terminal node involves an additional verification step for the case of ERC, which must be replaced with a constant within the range specified.

source
GP_NLS.fullMethod

Function that creates a tree using the full method, inspired by Koza's original work. Receives a set of fSet::Vector{Func} functions, a set of tSet::Vector{Union{Var, WeightedVar, Const, ERC}} terminals, and a maximum depth of maxDepth::Int64 allowed.

Returns any tree with maximum depth maxDepth and using the contents of past functions and terminals.

full(fSet::Vector{Func}, tSet::Vector{Union{Var, WeightedVar, Const, ERC}},
        maxDepth::Int64)::AbstractNode
source
GP_NLS.growMethod

Function that creates a tree using the grow method, inspired by Koza's original work. Receives a set of fSet::Vector{Func} functions that will be used in the internal nodes, a set of tSet::Vector{Union{Var, WeightedVar, Const, ERC}} terminals, and a maximum depth of maxDepth::Int64 allowed.

Returns any tree with maximum depth maxDepth created using the functions and terminal sets.

 grow(fSet::Vector{Func}, tSet::Vector{Union{Var, WeightedVar, Const, ERC}},
        maxDepth::Int64)::AbstractNode

Note that there is no minimum size, meaning that a single-node tree can be returned. The maximum depth considers weighted variables as a single node.

source
GP_NLS.init_pop_PTC2Method

Function that initializes a population of size popSize using the PTC2 method.

init_pop_PTC2(
    fSet::Vector{Func}, 
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}}, 
    minDepth::Int64,
    maxDepth::Int64,
    expctdSize::Int64,
    popSize::Int64)::Vector{AbstractNode}

Every initialization functions take the same parameters, but not all do of them makes use of all parameters. This is just to unify the call of initialization functions.

source
GP_NLS.init_pop_fullMethod

Function that initializes a population of size popSize using the full method.

init_pop_full(
    fSet::Vector{Func}, 
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}}, 
    minDepth::Int64,
    maxDepth::Int64,
    expctdSize::Int64,
    popSize::Int64)::Vector{AbstractNode}

Every initialization functions take the same parameters, but not all do of them makes use of all parameters. This is just to unify the call of initialization functions.

source
GP_NLS.init_pop_growMethod

Function that initializes a population of size popSize using the grow method.

init_pop_grow(
    fSet::Vector{Func}, 
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}}, 
    minDepth::Int64,
    maxDepth::Int64,
    expctdSize::Int64,
    popSize::Int64)::Vector{AbstractNode}

Every initialization functions take the same parameters, but not all do of them makes use of all parameters. This is just to unify the call of initialization functions.

source
GP_NLS.init_pop_rampedMethod

Function that initializes a population of size popSize using the ramped half-half method.

init_pop_ramped(
    fSet::Vector{Func}, 
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}}, 
    minDepth::Int64,
    maxDepth::Int64,
    expctdSize::Int64,
    popSize::Int64)

Every initialization functions take the same parameters, but not all do of them makes use of all parameters. This is just to unify the call of initialization functions.

source

Non-linear Least Squares optimization

Implementation of auxiliary functions and the nonlinear optimization method of coefficients. All functions declared here are for internal use by the module.

Types and functions

GP_NLS.adaptate_treeMethod

Function that receives a tree and makes the necessary adaptations to be able to apply the optimization with the nonlinear least squares method. Does not modify the arguments. Returns a function that receives X and theta to perform tree evaluation, the initial theta vector (which corresponds to the original coefficients of the tree before the optimization process), and the adapted tree.

adaptate_tree(node::AbstractNode)::Tuple{Function, Vector{Float64}, AbstractNode}

The adaptation is done by adding 4 new nodes in the tree, to create the linear transformation box (it adds an intercept and a slope to the tree). This function The returned function takes as arguments X::Matrix{Float64} and theta::Vector{Float64} to perform the evaluation (evaluate). The H function returned is internally used in an autodiff algorithm to obtain the Jacobian in the optimization method.

This function is for the internal use of the package, and is used in the optimization method.

source
GP_NLS.apply_local_optFunction

Function that receives a node of a tree (AbstractNode), an array with the X::Matrix{Float64} training data, and an array with expected values y::Vector{Float64}, and applies the optimization process of non-linear least squares. Returns an adjusted tree.

apply_local_opt(
    node::AbstractNode, X::Matrix{Float64},
    y::Vector{Float64}, keep_linear_transf_box=false)::AbstractNode

We can choose to return the expression with/without the transformation box. In case it is returned, it is worth noting that the code will not apply mutation or crossover at the nodes of the block.

This function is for the internal use of the package, and is used in the optimization method.

source
GP_NLS.evaluate_replacing_constsFunction

Function that evaluates the tree simulating that the constants were replaced by the theta values. The idea is to simulate the replacement without the need of completely rebuild the tree, aiming to reduce the computational cost when the optimization method performs many iterations.

evaluate_replacing_consts(
    node::Union{TerminalNode, InternalNode}, X::Matrix{Float64},
    theta::Vector{Float64}, c_idx::Int64=0)::Tuple{Vector{Float64}, Int64}

The c_idx variable is used internally to keep track of the indexes of theta that have already been used or not, so that the replacement simulation is done correctly.

This function is for the internal use of the package, and is used in the optimization method. Implements multiple dispatch.

source
GP_NLS.find_const_nodesFunction

Function that takes a node of a tree (AbstractNode) and traverses it recursively, looking for all terminal nodes that have as content a Const or a WeightedVar. This function makes internal use of a list of nodes Vector{AbstractNode}, which is passed by reference to the calls recursively. The recursive calls will add a reference to the nodes found in this list.

find_const_nodes(
    node::AbstractNode, nodes=Vector{AbstractNode}())::Vector{TerminalNode}

This function is for the internal use of the package, and is used in the optimization method.

source
GP_NLS.replace_const_nodesFunction

Function that takes a node of a tree (AbstractNode) and a vector of type theta::Vector{Float64} with the same number of elements as the number of constants from the given tree, and recursively replaces the constants of the tree. This is the update step of the GP-NLS algorithm, where we take new vales for the constants and replace them in the original tree with the elements of theta. This function does not change the arguments passed, and returns a new tree.

The theta must have the same number of the total count of Const and WeightedVar nodes in the tree.

replace_const_nodes(
    node::AbstractNode, theta::Vector{Float64}, _first_of_stack=true)::AbstractNode

The _first_of_stack argument is for internal use of the function. It is used to create a copy of theta where is safe to remove elements without changing the original vector.

This function is for the internal use of the package, and is used in the optimization method.

source

Genetic Programming algorithm

Mutation, crossover and GP implementation.

Types and functions

GP_NLS.crossoverMethod

Crossover function that does a recombination of the two parents passed as an argument, finding a breakpoint in each of the parent trees and swapping the subtree between those breakpoints. It doesn't change the parents. This crossover controls the number of nodes (not the depth) of the tree, avoiding to exceed the maximum value.

crossover(
    fst_parent::AbstractNode,
    snd_parent::AbstractNode,
    maxDepth::Int64,
    maxSize::Int64)::AbstractNode
source
GP_NLS.mutation!Method

Function that implements a traditional substitution mutation in a tree, respecting the maximum past depth. Modifies the tree passed.

mutation!(
    node::AbstractNode,
    maxDepth::Int64,
    maxSize::Int64,
    fSet::Vector{Func},
    tSet::Vector{Union{Var, WeightedVar, Const, ERC}},
    mutationRate::Float64)::AbstractNode

The mutationRate mutation rate should vary by $[0, 1]$ and determines the chance of occurring a mutation (replacing a random point with a new subtree). If no mutation is performed, then the given node is returned.

source
GP_NLS.tourn_selectionMethod

Function that receives two individuals and makes a simple tournament selection. An individual is a (fitness::Float64, node::AbstractNode) tuple with the tree and it fitness, to avoid recalculations all the time.

tourn_selection(
    ind1::Tuple{Float64, AbstractNode},
    ind2::Tuple{Float64, AbstractNode})::Tuple{AbstractNode, Float64}

The return type is a tuple with the winning individual(that is, the individual winner tuple is returned).

source