Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Updated. GeneticEngine to latest version (2024 competition) #189

Merged
merged 27 commits into from
Oct 14, 2024
Merged
Show file tree
Hide file tree
Changes from 24 commits
Commits
Show all changes
27 commits
Select commit Hold shift + click to select a range
c5dc453
Updated GeneticEngine to latest version
alcides Oct 1, 2024
e39ef4a
Updated regressors
alcides Oct 1, 2024
6a20add
Merge branch 'docker-compose' into docker-compose
gAldeia Oct 7, 2024
a040ebc
Merge branch 'docker-compose' into docker-compose
gAldeia Oct 7, 2024
4419d22
Merge branch 'docker-compose' into docker-compose
gAldeia Oct 7, 2024
3a71ebd
Merge branch 'docker-compose' into docker-compose
gAldeia Oct 9, 2024
5247fbb
Updated GeneticEngine pip and setuptools versions
alcides Oct 10, 2024
594d1e6
Added all geneticengine dependencies
alcides Oct 10, 2024
254f2fb
Added dill dependency
alcides Oct 10, 2024
8a44c9c
Upgraded GeneticEngine version
alcides Oct 11, 2024
9c98f2f
Bringing back dill
alcides Oct 11, 2024
027b1e5
Right GE version
alcides Oct 11, 2024
c6128c3
max_time int
alcides Oct 11, 2024
42f26b3
Fixed git commit
alcides Oct 11, 2024
0970138
Bumped geneticengine
alcides Oct 11, 2024
c954499
Bump GeneticEngine
alcides Oct 11, 2024
09a4041
Bumped GeneticEngine
alcides Oct 11, 2024
6ef8e4f
Reduced GeneticEngine default time to 360s
alcides Oct 11, 2024
b5cd6f8
1 second
alcides Oct 11, 2024
12608b1
Bumped GeneticEngine
alcides Oct 11, 2024
dd3c3cc
Bumped GeneticEngine
alcides Oct 11, 2024
da61a2d
Bumped GeneticEngine
alcides Oct 11, 2024
7347f42
Bumped GeneticEngine
alcides Oct 11, 2024
d0cb2c4
Bump GeneticEngine
alcides Oct 13, 2024
f6936b3
Reverted Operon regressor
alcides Oct 14, 2024
df52430
Reverted Opteron
alcides Oct 14, 2024
dfd6065
Reverted test_population
alcides Oct 14, 2024
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions algorithms/geneticengine/environment.yml
Original file line number Diff line number Diff line change
Expand Up @@ -3,9 +3,9 @@ channels:
- conda-forge
- defaults
dependencies:
- python=3.9.12
- python_abi=3.9
- setuptools=62.1.0
- pip=22.0.4
- python=3.12.6
- setuptools=75.1.0
- pip=24.2
- dill=0.3.9
- pip:
- geneticengine==0.0.18
- geneticengine @ git+https://github.com/alcides/GeneticEngine.git@5e7c4ec8807b8299f92e0347443714e16f22e40f
18 changes: 6 additions & 12 deletions algorithms/geneticengine/metadata.yml
Original file line number Diff line number Diff line change
@@ -1,16 +1,10 @@
authors: # the participants
- Guilherme Espada, Leon Ingelse, Paulo Santos, Pedro Barbosa, Alcides Fonseca
email: gjespada@fc.ul.pt, leoningelse@gmail.com, pacsantos@fc.ul.pt, psbarbosa@ciencias.ulisboa.pt, me@alcidesfonseca.com
name: Genetic Engine
- Alcides Fonseca, Guilherme Espada, Leon Ingelse, Eduardo Madeira
email: me@alcidesfonseca.com, gjespada@fc.ul.pt, leoningelse@gmail.com, jmadeira@lasige.di.fc.ul.pt
name: Genetic Engine
description:
Genetic Engine is a framework for using Genetic Programming in different contexts. Genetic Engine allows the user to define trees in terms of Classes and Inheritance, as they would in a regular programming environment. Our framework takes care of generating individuals, mutating them and crossing them over. The user also defines a fitness function that takes a tree and returns a fitness score. This often requires to write (or reuse) a tree interpreter, as it is custom in these types of approaches.
GeneticEngine is an extensible framework for implementing evolutionary algorithms. Users can write their own class structure and select the algorithm they want to run. Each algorithm will generate random trees following the class specification.
Currently, GeneticEngine supports several representations (tree-based, grammatical evolution, structured grammatical evolution, stack-based), several algorithms (hill-climbing, Genetic Programming, 1+1) and several operators (Tournament, Lexicase, different mutations and crossovers).
A distinctive feature of this framework is that it supports dependent types that semantically constraint the tree generation.

The main different between STGP and GGGP is that the restrictions on what trees are valid are done via types, while in GGGP they are expressed using a grammar. Genetic Engine extracts the grammar from the types and their relationship, allowing to use any technique from GGGP (such as Grammatical Evolution) in STGP.

The advantages of using STGP are:

* the user does not need to know grammars, EBNF or any other grammar syntax
* there is no need for a textual representation of programs, as trees can be the only representation (à lá lisp).
* there is no need for parsing a textual program to a tree, to then interpret the tree
* Mutations and Recombination are automatically type-safe, where in a grammar that type-safety is implicit in the structure of the grammar (and hard to reason)
url: "https://github.com/alcides/GeneticEngine/"
10 changes: 10 additions & 0 deletions experiment/methods/geneticengine/metadata.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
authors: # the participants
- Alcides Fonseca, Guilherme Espada, Leon Ingelse, Eduardo Madeira
email: me@alcidesfonseca.com, gjespada@fc.ul.pt, leoningelse@gmail.com, jmadeira@lasige.di.fc.ul.pt
name: Genetic Engine
description:
GeneticEngine is an extensible framework for implementing evolutionary algorithms. Users can write their own class structure and select the algorithm they want to run. Each algorithm will generate random trees following the class specification.
Currently, GeneticEngine supports several representations (tree-based, grammatical evolution, structured grammatical evolution, stack-based), several algorithms (hill-climbing, Genetic Programming, 1+1) and several operators (Tournament, Lexicase, different mutations and crossovers).
A distinctive feature of this framework is that it supports dependent types that semantically constraint the tree generation.

url: "https://github.com/alcides/GeneticEngine/"
162 changes: 26 additions & 136 deletions experiment/methods/geneticengine/regressor.py
Original file line number Diff line number Diff line change
@@ -1,156 +1,46 @@
from audioop import cross
from typing import Union
import geneticengine.off_the_shelf.regressors as gengy_regressors
from sklearn.base import BaseEstimator, TransformerMixin
from sklearn.model_selection import GridSearchCV
# This example submission shows the submission of FEAT (cavalab.org/feat).
from geml.regressors import GeneticProgrammingRegressor
from geml.regressors import model
from sklearn.base import RegressorMixin

"""
est: a sklearn-compatible regressor.
if you don't have one they are fairly easy to create.
see https://scikit-learn.org/stable/developers/develop.html
"""
est: RegressorMixin = GeneticProgrammingRegressor(
max_time=3, # 8 hrs. Your algorithm should have this feature
)

class OptimisedGPRegressor(BaseEstimator, TransformerMixin):
def __init__(
self,
population_size = 50,
n_novelties = 10,
favor_less_deep_trees = True,
random_state = 123,
max_time = 100,
optimisation_dedicated_proportion = 0.2,
slack_time = 0.05,
):
self.population_size = population_size
self.n_novelties = n_novelties
self.favor_less_deep_trees = favor_less_deep_trees
self.random_state = random_state
self.max_time = max_time
self.optimisation_dedicated_proportion = optimisation_dedicated_proportion
self.slack_time = slack_time
self.model = None


def fit(self, X, y):
n_elites = [ 5, 10 ]
max_depths = [ 10, 15 ]
hill_climbings = [ True, False ]
mutation_probs = [ 0.01, 0.05, 0.1, 0.2 ]
crossover_probs = [ 0.8, 0.9, 0.95 ]

CVS = 2

param_grid_size = len(n_elites) * len(max_depths) * len(hill_climbings) * len(mutation_probs) * len(crossover_probs)
param_alloted_time = int((self.max_time * self.optimisation_dedicated_proportion) / (param_grid_size * CVS))
if param_alloted_time < 1: # For testing
n_elites = [ 5 ]
max_depths = [ 10 ]
mutation_probs = [ 0.01 ]
crossover_probs = [ 0.8 ]

param_alloted_time = 1

param_grid: Union[dict, list] = {
"population_size": [ self.population_size ],
"n_elites": n_elites,
"n_novelties": [ self.n_novelties ],
"max_depth": max_depths,
"favor_less_deep_trees": [ self.favor_less_deep_trees ],
"seed": [ self.random_state ],
"hill_climbing": hill_climbings,
"probability_mutation": mutation_probs,
"probability_crossover": crossover_probs,
"timer_stop_criteria": [ True ],
"timer_limit": [ param_alloted_time ],
"metric": [ 'r2' ],
}

grid_search = GridSearchCV(gengy_regressors.GeneticProgrammingRegressor(),param_grid,cv=CVS)

grid_search.fit(X,y)
model = grid_search.best_estimator_

model_alloted_time = int(self.max_time * (1 - self.optimisation_dedicated_proportion - self.slack_time))
if "timer_limit" in model.get_params():
model.set_params(timer_limit=model_alloted_time)

model.fit(X,y)
self.model = model

self.sympy_compatible_phenotype = model.sympy_compatible_phenotype
def get_population(est) -> list[RegressorMixin]:
"""
Return the final population of the model. This final population should
be a list with at most 100 individuals. Each of the individuals must
be compatible with scikit-learn, so they should have a predict method.

return model

def predict(self, X):
assert self.model != None
y_pred = self.model.predict(X)

return y_pred

def score(self, X, y):
self.model.score(X, y)
Also, it is expected that the `model()` function can operate with them,
so they should have a way of getting a simpy string representation.



Returns
-------
A list of scikit-learn compatible estimators
"""

return est.get_population()

est = OptimisedGPRegressor(
population_size = 250,
n_novelties = 10,
favor_less_deep_trees = True,
)

def model(est, X=None):
def get_best_solution(est) -> RegressorMixin:
"""
Return a sympy-compatible string of the final model.
Parameters
----------
est: sklearn regressor
The fitted model.
X: pd.DataFrame, default=None
The training data. This argument can be dropped if desired.
Return the best solution from the final model.

Returns
-------
A sympy-compatible string of the final model.
Notes
-----
Ensure that the variable names appearing in the model are identical to
those in the training data, `X`, which is a `pd.Dataframe`.
If your method names variables some other way, e.g. `[x_0 ... x_m]`,
you can specify a mapping in the `model` function such as:
```
def model(est, X):
mapping = {'x_'+str(i):k for i,k in enumerate(X.columns)}
new_model = est.model_
for k,v in mapping.items():
new_model = new_model.replace(k,v)
```
If you have special operators such as protected division or protected log,
you will need to handle these to assure they conform to sympy format.
One option is to replace them with the unprotected versions. Post an issue
if you have further questions:
https://github.com/cavalab/srbench/issues/new/choose
A scikit-learn compatible estimator
"""

# Here we replace "|" with "" to handle
# protecte sqrt (expressed as sqrt(|.|)) in FEAT)
model_str = est.sympy_compatible_phenotype

return model_str
return est.get_best_solution()

def pre_train_fn(est, X, y):
"""set max_time in seconds based on length of X."""
slack = 20
if len(X)<=1000:
max_time = 3600 - slack
else:
max_time = 36000 - slack
est.set_params(max_time=max_time)

# pass the function to eval_kwargs
eval_kwargs = dict(
pre_train=pre_train_fn,
test_params={'max_time': 100,
}
)

# define eval_kwargs.
eval_kwargs = {}
Empty file.
46 changes: 46 additions & 0 deletions experiment/methods/geneticengine_1p1/regressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# This example submission shows the submission of FEAT (cavalab.org/feat).
from geml.regressors import RandomSearchRegressor
from geml.regressors import model
from sklearn.base import RegressorMixin

"""
est: a sklearn-compatible regressor.
if you don't have one they are fairly easy to create.
see https://scikit-learn.org/stable/developers/develop.html
"""
est: RegressorMixin = RandomSearchRegressor(
max_time=3, # 8 hrs. Your algorithm should have this feature
)


def get_population(est) -> list[RegressorMixin]:
"""
Return the final population of the model. This final population should
be a list with at most 100 individuals. Each of the individuals must
be compatible with scikit-learn, so they should have a predict method.

Also, it is expected that the `model()` function can operate with them,
so they should have a way of getting a simpy string representation.

Returns
-------
A list of scikit-learn compatible estimators
"""

return est.get_population()


def get_best_solution(est) -> RegressorMixin:
"""
Return the best solution from the final model.

Returns
-------
A scikit-learn compatible estimator
"""

return est.get_best_solution()


# define eval_kwargs.
eval_kwargs = {}
Empty file.
46 changes: 46 additions & 0 deletions experiment/methods/geneticengine_hc/regressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# This example submission shows the submission of FEAT (cavalab.org/feat).
from geml.regressors import HillClimbingRegressor
from geml.regressors import model
from sklearn.base import RegressorMixin

"""
est: a sklearn-compatible regressor.
if you don't have one they are fairly easy to create.
see https://scikit-learn.org/stable/developers/develop.html
"""
est: RegressorMixin = HillClimbingRegressor(
max_time=3, # 8 hrs. Your algorithm should have this feature
)


def get_population(est) -> list[RegressorMixin]:
"""
Return the final population of the model. This final population should
be a list with at most 100 individuals. Each of the individuals must
be compatible with scikit-learn, so they should have a predict method.

Also, it is expected that the `model()` function can operate with them,
so they should have a way of getting a simpy string representation.

Returns
-------
A list of scikit-learn compatible estimators
"""

return est.get_population()


def get_best_solution(est) -> RegressorMixin:
"""
Return the best solution from the final model.

Returns
-------
A scikit-learn compatible estimator
"""

return est.get_best_solution()


# define eval_kwargs.
eval_kwargs = {}
Empty file.
46 changes: 46 additions & 0 deletions experiment/methods/geneticengine_rs/regressor.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,46 @@
# This example submission shows the submission of FEAT (cavalab.org/feat).
from geml.regressors import OnePlusOneRegressor
from geml.regressors import model
from sklearn.base import RegressorMixin

"""
est: a sklearn-compatible regressor.
if you don't have one they are fairly easy to create.
see https://scikit-learn.org/stable/developers/develop.html
"""
est: RegressorMixin = OnePlusOneRegressor(
max_time=3, # 8 hrs. Your algorithm should have this feature
)


def get_population(est) -> list[RegressorMixin]:
"""
Return the final population of the model. This final population should
be a list with at most 100 individuals. Each of the individuals must
be compatible with scikit-learn, so they should have a predict method.

Also, it is expected that the `model()` function can operate with them,
so they should have a way of getting a simpy string representation.

Returns
-------
A list of scikit-learn compatible estimators
"""

return est.get_population()


def get_best_solution(est) -> RegressorMixin:
"""
Return the best solution from the final model.

Returns
-------
A scikit-learn compatible estimator
"""

return est.get_best_solution()


# define eval_kwargs.
eval_kwargs = {}
2 changes: 1 addition & 1 deletion experiment/methods/operon/regressor.py
Original file line number Diff line number Diff line change
Expand Up @@ -113,7 +113,7 @@ def callback(self, study, trial):


class OperonOptuna(BaseEstimator, RegressorMixin):
def __init__(self, name, n_folds=3, n_trials=100, max_time=3600, fixed_params={}, suggest_callback=None, random_state=None):
def __init__(self, name, n_folds=3, n_trials=100, max_time=1,0, fixed_params={}, suggest_callback=None, random_state=None):
alcides marked this conversation as resolved.
Show resolved Hide resolved
self.name = name
self.n_folds = n_folds
self.n_trials = n_trials
Expand Down
Loading
Loading