Sudoku’s are the little number puzzle’s you find in every newspaper. Solving them can sometimes be quite frustrating. “Why not solve all of them, without even thinking about the puzzle itself?”, I said to myself, so I can finally turn to more productive work. To do so, we use a so called genetic algorithm. But before I show you the solution, I’ll need to explain a little about genetic algorithms in general and tell you a small disillusioning fact.
First of all we need to picture the problem of solving sudokus as an optimization problem. Imagine filling in the empty cells randomly (or kind of randomly) and only after that evaluating how we did. In the second step we’d try to reduce mistakes we made with e.g. swapping numbers in rows. Following this we again count the mistakes. If the number of mistakes we count in your sudoku is zero we have succeeded. Essentially we are optimizing the mistakes in the sudoku.
There exist numerous algorithms for solving optimization problems, but why not copy from the best optimization process we know of, namely evolution. Algorithms which mimic evolution are called genetic algorithms. They are greatly explained in this article, but I’ll shortly go over their key parts:
Individuals: These are the main subjects of our problem. In our case an individual will be a sudoku. At the beginning a lot of sudokus will be randomly created. Theoretically we can picture ourselves as sudoku breeders.
Genes: Each individual is associated with genes. For a sudoku the genes could be the list of all numbers it contains.
Fitness: Some individuals can be fitter than others. For a sudoku we will measure its “fitness” depending on how much mistakes or conflicts it contains.
Reproduction: As in the actual evolution, reproduction plays an important role. In the step of reproduction we take two (or more) existing sudokus and combine their genes in a way to generate an entirely new sudoku. (Note that we are always only allowed to change cells which are blank in the original puzzle.)
Selection: The selection process makes sure that fitter individuals are more likely to be selected for reproduction. So in the reproduction step we don’t just take two arbitrary sudokus, but let them be determined by the selection step which prefers fitter sudokus, that means sudokus with less conflicts. This step mimics the “survival of the fittest” concept. There are plenty of selection algorithms out there, e.g. the Roulette Wheel Selection or the Tournament Selection.
Mutation: This is the concept of changing a gene of an individual randomly. For sudokus mutation could be implemented by changing a number randomly to another number. The likelihood of mutation happening is called “mutation rate”.
Population: A population is a set of individuals. The size of a population strongly influences the behavior of the genetic algorithm.
In a nutshell a genetic algorithm initially creates a random population and evolves it with the use of selection and reproduction, until an individual is created which fulfills our requirements.
It’s pretty awesome that we can use these simple and intuitive concepts to derive a solution for rather complex problems, like sudokus.
Insertion: Sudokus are hard problems. NP-hard and even NP-complete, from which follows that if you knew an efficient way of solving sudokus, you could help curing cancer. But that’s a whole another story. If you want to know more, this video explains it greatly.
The best part is, we don’t even have to to puzzle our head over the problem itself, namely solving the sudoku. We just have to somehow represent sudokus, e.g. with arrays, implement a function for selection and for reproduction and voilà. Till now everything seems fine.
Unfortunately solving sudokus with genetic algorithm is hard. How come? Let me explain it with analogy: Imagine you are in an undulating landscape and have the task to, with blindfolded eyes, find the lowest point. How would you do that? You’d probably feel the area around you and walk in the direction the landscape goes downwards. If you walk into a hollow you’ll stop. It could be that you are at the lowest point, however you don’t know for sure. The same idea applies to our sudokus. We can picture our search for the solution as a hike through a hilly landscape, where the lowest point represents the solution. If the landscape contains only one pit, we’ll surely arrive at the desired destination. However, the more hollows there are, the more difficult it gets. Much to our regret sudokus are problems for which the metaphorical landscape (more formally called “problem space”) contains a lot of hollows (“false minima”).
Nontheless we all know the natural law, that if a waiter tells you that your plate is hot, you can’t resist touching it. So we have quick look what we can do about that hilly landscape.
In the first step we put the genetic algorithm aside and apply a simple sloving method to the puzzle. A detailed description with the title “Solve Sudoku (Without even thinking!)” of this step can be found here. So basically we remain true to idea of solving a sudoku without thinking about it. After this step the sudoku will be partly solved. How much of the puzzle already is solved depends on its difficulty. The less empty cells are left over, the smaller our problem space got and the easier it is for the genetic algorithm to find the solution.
The key part of applying a genetic algorithm to a problem space with a lot of false minima (or false maxima) is, to guarantee a lot of diversity. In practice this means having a large population size, ensuring that the fitness of good individuals doesn’t differ too much of the one of not so good individuals and having a high mutation rate.
Another measure which can be taken is restarting. If the best individual of the population hasn’t changed for let’s say twenty generations (this means we got trapped in a false minima), we simply restart the algorithm, but save the best few individuals. After a few restarts we collected enough elites (best individuals of a population) to create a population composed of them alone. Running the genetic algorithm with the initial population containing former elites is likely to bring the solution.
Presolving the sudoku with a simple method and mixing restarts into the genetic part of the algorithm solves almost all sudokus. A concrete program in Java implementing the concepts described above can be found on my Github repository.