Skip to content

Project to maximize the neighbors of a 2D protein sequence using genetic algorithm

License

Notifications You must be signed in to change notification settings

MPritsch/genetic-algorithm-protein-folding

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

38 Commits
 
 
 
 
 
 
 
 
 
 

Repository files navigation

Genetic algorithm protein folding

This project was developed for a genetic algorithm course at the "Hochschule Darmstadt".

The input is a string of 0s and 1s. The goal is to fold the sequence optimizing the n4 neighbors of the 1s without an overlap in the chain. All this happens on a 2D plain with 3 possible directions: straight, left, right.

Possible settings

In the Main (org.hda.gaf.Main) there are several settings you can modify/comment in:

  • Choose one of the genetic algorithms:

    • Time limited algorithm (e.g. generate for 20s, then stop)
    • Generation limited algorithm (e.g. generate 100 generations, then stop)
  • Selection algorithms:

    • Fitness proportional (Every individual has a chance to get chosen. A higher fitness results in a better chance to get chosen)
    • Tunier fitness proportional (Same as fitness proportional but pairing individuals in a tunier and choosing every turn)
    • Tunier best fitness (Pairing individuals in a tunier and choosing the one with the best fitness)
  • Population amount: Individual amount per generation

  • Mutate rate: Percent of how many genes of the total population should be mutated after selection. The mutation includes changing a left turn to either a straight, right or left turn.

    E.g. Having 100 genes per individual in a population with 10 individuals means we have a total of 1000 genes. With a mutation rate of 2% a total 20 of the genes gets modified every generation.

  • Crossover rate: Percent of how many individuals in a population should do a crossover every generation. Using a simple one point crossover (splitting the gene chain anywhere and switching it with another partner in the population).

    E.g. Having a population of 100 individuals and a crossover rate of 20% means that 20 individuals crossover every generation.

  • Print while generating: Activates a GUI displaying the currently best found protein and some stats. For better performance deactivate it.

Generate jar

You can package this project via maven:
mvn clean package The jar should then be available at
target/genetic-algorithm-protein-folding-1.0-SNAPSHOT-jar-with-dependencies.jar

Execute

You can either execute this program using an IDE like IntelliJ or you can execute it via maven:
java -jar target/genetic-algorithm-protein-folding-1.0-SNAPSHOT-jar-with-dependencies.jar

Other projects

The project was also implemented using different parallel computing technologies:

Java Multithreading:
Using multiple java threads to share the workload. Exchanging part of the generated population with other threads to allow for more variety.
https://github.com/MPritsch/genetic-algorithm-protein-folding-multithread

MPJ Express:
MPJ is a java implementation of MPI (Message Passing Interface). It can be used to distribute the computing workload on a Cluster with multiple CPU's. Part of the generated population is shared with other MPJ processes to allow for more variety.
https://github.com/MPritsch/genetic-algorithm-protein-folding-mpj

Known Issues

  • There seems to be a bug with the graphic output which sometimes fails on startup. The current workaround is to just restart the program until it properly works.

About

Project to maximize the neighbors of a 2D protein sequence using genetic algorithm

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages