A New Genetic Algorithm

I have recently been fascinated with the genetic algorithm, which is a method that allows you to solve pretty much any big problem via computer, like finding the most nutritional, affordable lunches for a week for your kids, or what's the best NFL schedule (funny that these problems are provably equivalent).

The idea is simple. generate a bunch of mediocre solutions, kill off (eliminate) the weak ones, use the stronger ones to sire more solutions, mutate a gene (aspect) of some solution or other just to shake things up, and repeat.

The problem is all of the different ways to do it. Should we mutate 5% of the solutions, or 10%? How many solutions should we kill off at a time? How many should we make? There are alot of design decisions to make when using this algorithm.

And then there's efficiency to think about. How should the solutions be represented (e.g., what stats should be encoded)? What data structures should be used? What complexity class is acceptable (has to do with how much time the program takes to run)? These questions may sound unimportant from a business perspective, but it's the difference in an app between 'wow, that was zippy' and 'that sure took a long time to load'.

The 'data structure/data representation' question is particularly important because not only does it deal with how fast the program runs, but how maintable the code base will be in the future. When, as a programmer, I code up a solution to a problem, it is very difficult to change the way the data is represented and stored in the future.

I have given all these problems some thought, and I have come up with (I believe) a very good solution. The data representation is versatile. The data structure is maintainable, efficient, and convenient. Best of all, I have found a convenient way to get rid of all those parameters (i.e., I have adequately answered all of the 'how/what' questions about the algorithm, listed above).

I will post a series of articles detailing what I did to create a genetic algorithm which needs only be given a fitness function and the size of the population of solutions -- nothing else. It is fast, usable in a highly varied number of fields, and its operation is intuitive.

My first post will deal with the data structure / data representation problem. My second will deal with removing the how-many-to-kill-off/how-many-to-reproduce/how-many-to-mutate problem. My third post will tie it all together. Let me know in the comments below to join the conversation and let me know how I'm doing as I go. I will try to answer any questions given.

Comments