Keeping it SimpleNobody likes using a tool with a bunch of dials and switches on it. Would you rather use a drill with five dials on it with labels on those dials that look like they have made-up words on it? And you're thinking, "I just want to drill a hole!"
You are not alone. Most people, when faced with using something like a drill, or washing machine, or algorithm, would much rather use an 'on' switch than anything else.
The thing is, most genetic algorithms that I've come across have five dials on it, which all need to be set properly in order for the algorithm to work:
- The fitness function
- When to stop
- What the kill-off rate will be
- What the mutation rate will be
- The size of the initial population
I believe that the genetic algorithm is only used, either in heavy industry, or academia, precisely because it has five dials and switches. One almost needs an algorithm to solve the problem of how to set these "dials"!
So, a major goal of this new genetic algorithm is to take the algorithm to the masses. It asks the question: "What would a normal person care about?"
A normal person would care about the first two things in the above list:
- the fitness function and
- when to stop.
A "fitness function" is a set of instructions given to the algorithm to score the solutions it finds. To stick with our previous post's example, the "solutions" represent which strategy to employ in getting food for my family on the way home from work: Do I get takeout? Do I get fast food? Do I go home and cook something? etc. It's easy to tell an algorithm "I want food quickly, and I'm willing to spend a little to get good food, too." A statement like this is, in essence, the "fitness function". With this fitness function, the genetic algorithm would score 'getting takeout' high, and 'cooking at home' low. The fitness function is both easy to understand and essential for the algorithm to work, we'll probably keep this as an input to the algorithm as it is.
The question of "when to stop" is also relatively straight forward to answer for the user of the algorithm, but hard for the algorithm itself to answer. Something like "stop when you can't find anything better" or "keep improving your solution until 7am tomorrow" is both an easy and a relative piece of information for a user to give the algorithm, while it is hard for the algorithm to guess at when to stop. See, the genetic algorithm can just keep. on. going. It will find try to find better and better solutions until you tell it to stop. This makes sense. There are a million ways, for example, to get food at the end of a work day. However, I usually know as a user when I want to stop looking for a better solutions. Telling the algorithm when to stop is telling the algorithm when the solution it's trying to find is "good enough". Therefore, this parameter also makes sense for the user to provide.
It's the other three that can be thorns in a user's side. There is no intuition that can help the user determine what 'kill-off rate', 'mutation rate', or 'population size'. These three parameters need to be set and set well, but the only way a user would know how to set them is trial and error. The New Genetic Algorithm is cool, not only because of its back-end efficiency, but its front-end user friendliness: it knows how to set those three parameters for you.
Setting The Parameters
Most genetic algorithms will ask you to specify a kill-off rate: a percentage of your population which should be killed off at every step of the algorithm (each "season", if you will). It asks the question: at each iteration, how many of the "weak" of the population should die? This rate also dictates the number of new solutions created at each step, since in general the algorithm should deal with a fixed population size.
So what should our algorithm choose as its kill-off rate? 5%? 10%? 2%? All of these choices seem so arbitrary. There really isn't a choice I have found that makes more sense than the others. Too many killed off, means you won't get as good of results as you could have. Too few, and it will take a while to get there.
It should be stated here that kill-off rate is kind of related to something which in other algorithms is called "learning rate". The higher the kill-off rate, the faster the solutions will improve; however, the slower the kill-off rate, the better the results will ultimately be, even if it takes a while to get there. The kill-off rate represents the 'price/quality' tradeoff: more quality = more time = smaller kill-off rate; less quality = faster running = greater kill-off rate. What would be really cool is if the algorithm could choose the kill-off rate based on how much time it was given to run (when to stop). That way the user only need specify when to stop and the algorithm knows how long to run.
In that light, and for reasons that will become clear later on in this series, I simply chose to kill of one (
1) solution per step. This essentially means that the population size ultimately decides this kill-off rate. If I have
100solutions in my population, that means the kill-off rate is
1%. If I have only
10, it means my kill-off rate is
10%. This may seem like a bad thing, but what it does is that it allows us to choose kill-off rate based on population size. In other words, by making kill-off rate dependent on population size, we can control how fast the algorithm learns by only regulating one variable (the size of the population). This makes thinking about how to tell the algorithm how fast or slow to learn much easier. It also makes the single step of the algorithm very fast and gives it more predictable results.
A small word should be put here about how creation of new solutions in the population works.
Mutation relies on the fact that, all bit strings of length
nspecify a solution. It flips one of its bits at random. The genetic algorithm doesn't see the solution for a solution; rather, it just sees a bunch of 1's and 0's. This is consistent with nature. Nature doesn't see a bacteria for what it is; it just sees a bunch of DNA. The 1's and 0's are the solution's "DNA". Thus by flipping one of the random bits in the solution, we mutate the solution.
There is a small amount of math involved here, but not bad; just enough to keep things interesting.
We want to choose a mutation rate (the percentage of the population getting this random bit flipped) intelligently. The new algorithm is herein designed to choose it to maximize the diversity of the population.
Here in our new algorithm, we want a highly diverse population. If you have many solutions that are roughly the same, what is the benefit of a population at all?
A Side Note About How We Create New Solutions
In the new algorithm, we will use single crossover to create a new solution. There are lots of ways to create new solutions, but as we intimated in the beginning, we like to keep things simple in this new algorithm. This just means we pick a random number between
n(the size of the solutions -- the number of bits they contain). Call that number
p. The first
pbits come from the first
pbits of one solution (the "mom") and the rest of the bits come from the other solution (the "dad").
Who shall we chose to make new solutions? the two best, or part of the 'party' of the elite of the population? Perhaps we could, but here in our new algorithm, let's keep things simple. We will simply take two random solutions from the population and pair them up to create a new solution.
Why This is Important to Choosing Mutation Rate
Now let's reiterate our problem. We need to choose mutation rate to maximize diversity of the population. In order to ascertain diversity, we need a population sample.
But we just chose two solutions at random to create a new solution. Why don't we just use the two solutions we just chose as a small "sample" of the population of solutions? That way, we only need to retrieve two solutions per iteration, kill one off, create one, and possibly mutate one. Genious! then the iterations can be done very fast.
How do we determine if the population is "diverse"? Simple: we count up how many bits at each position from
nin the bit strings of the mom and dad solutions are the same. The more that are the same, the more "the same" the population is and the more it is due for a mutation. Then, if we do decide to mutate a solution, we will simply mutate one of the solutions and re-insert it back into our population.
We will use this formula to determine the probability of whether we should mutate one of the parents:
P(m) = 4*(1/2*k/n - 1/2)^2
P(m)is the probability that we should mutate one of the solutions,
nis the number of bit positions for solutions, and
kis the number of bit positions where both parents share the same value.
This is a nice, easy-to-compute, bowl-shaped function that actually looks a lot like the information entropy function for two values. That's math-speak for "it worky."
Think about it -- If about half the bits are the same, which you would expect that from two random solutions, the probability of mutation is small, since the two solutions are expected to look pretty different from each other -- around half their positions are different. The closer
0, though, that value goes to
100%, meaning that one of them really should be mutated, since each solution looks too much like the other, or that they look like polar opposites, both types of which we would not expect in a more diverse population.
After we have computed
P(m), then we just get a random number between
0and 1. Call it
r < P(m), then we mutate one of the new solutions and re-introduce it to the population. If it isn't, we simply put both parents back un-mutated.
There! we have chosen a good method for choosing mutation rate dynamically. It is chosen at each iteration, is easy to compute, and makes sense for the goal.
As New Genetic Algorithm designers, it's our job to figure out how best to determine population size. This is the size of the RankedSet discussed in the last post of the number of "indigenous" solutions that we'll keep around. We need a base population if we want to kill some off and create more and still have the genetic algorithm to work. So how do we choose population size?
Remember earlier, when I said that by determining population size and keeping the kill-off rate at one solution, we determine the 'learning rate'? When we have a small population size, say 10, the learning rate/kill off rate is
1 / 10 = 10%. This means the algorithm will improve fast, but also plateu fast.
Why don't we just wait for it to plateu at some improvement level, then when it does, we increase the size of the population? By doing so, we decrease the learning rate, allowing the algorithm to find an even better solution, even if more slowly.
To keep things simple, let's just say this: We'll start off at a population size of
16. When we find that no more "improvement" is to be found within the population, we double population size and continue iterating. We keep doing this until we are supposed to stop, which is something that the user supplies!
Simple, easy, intuitive, and easy to implement. These are the goals of the new genetic algorithm.