Algorithmics of sudoku


Algorithmics of sudoku

The class of Sudoku puzzles consists of a partially completed row-column grid of cells partitioned into "N" "regions" or "zones" each of size "N" cells, to be filled in using a prescribed set of "N" distinct symbols (typically the numbers {1, ..., "N"}), so that each row, column and region contains exactly one of each element of the set. The puzzle can be solved using a variety of algorithms. This page provides algorithms only.

olving sudokus by backtracking

The basic backtracking algorithm can be adapted to solve sudokus. This is straightforward. Say a "zone" is a subset of "N" boxes of an "N x N" grid, which must contain the numbers from 1 to "N". A standard sudoku contains 27 zones, namely 9 rows, 9 columns and 9 squares that are 3 x 3. In a jigsaw sudoku, the square zones are replaced by zones having irregular boundaries, like a jigsaw piece.

One possible algorithm that uses backtracking to solve such sudokus constructs a graph on N^2 vertices, one vertex for each box of the grid. Two vertices are connected by an edge if there exists a zone containing the two boxes. The problem is then equivalent to coloring this graph with "N" colors, where adjacent vertices may not have the same color. This is done by starting with an empty assignment of colors and assigning colors to vertices one after another, using some fixed order of the vertices. Whenever a color is being assigned, we check whether it is compatible with the existing assignments, i.e. whether the new color occurs among the neighbors of that vertex. If it doesn't, then we may assign it to the vertex and try to process another vertex. We backtrack once all "N" colors have been tried for a given vertex. If all vertices have been assigned a color, then we have found a solution. There are of course much more sophisticated algorithms to solve graph coloring. If the sudoku contains initial data, i.e. some boxes have already been filled, then these go into the color assignment before backtracking begins and the vertex sequence includes only the empty boxes.

The above algorithm was used to solve a 10x10 jigsaw sudoku that was proposed on "Les-Mathematiques.net" A link to the proposal may be found in the section for external links. The first section of the program defines the 10 jigsaw pieces (zones), the second the row and column zones. Thereafter the graph is constructed as an adjacency list. The search procedure prints completed solutions (when all 100 boxes have been assigned). Otherwise it computes the set of colors present among the neighbors of the next vertex to be processed, and recursively tries those assignments that do not conflict with this set. The search starts with an empty assignment.

Exact Cover in solving sudokus

Sudoku can be described as an Exact cover problem. This allows both for a very elegant descriptionof the problem and an efficient backtrack algorithm for solving the problem.

In an exact cover problem, there is given a universe "U" of elements and a collection mathcal{S} of subsets of "U". The task is to find a subcollection mathcal{S}^* of mathcal{S} such that every element in "U" is an element of exactly one set in mathcal{S}^*.

The challenge in applying the exact cover problem to sudoku is to find a definition for the elements of "U" such that every valid sudoku solution must contain every element of "U" exactly once, and to find a definition for the elements of mathcal{S} (subsets of "U") such that if the union of a disjoint collection of these elements gives "U", then the elements specify a completely filled-in sudoku grid which satisfies every constraint.

Let "S" = {"s11", "s12", …, "s19", "s21", …, "s99"} be the set of squares of the sudoku grid.

Let "ri" = {"si1", "si2", …, "si9"} ⊆ "S" be the set of squares belonging to row "i", and let "R" = {"r1", "r2", …, "r9"} be the set of rows.

Let "cj" = {"s1j", "s2j", …, "s9j"} ⊆ "S" be the set of squares belonging to column "j", and let "C" = {"c1", "c2", …, "c9"} be the set of columns.

Clearly, for each row "i" and column "j", {"sij"} = "ri ∩ cj".

Let "bk" be the set of squares belonging to block "k", and let "B" = {"b1", "b2", …, "b9"} be the set of blocks.

Thus, "b1" is the intersection of rows 1 to 3 with columns 1 to 3, "b2" is the intersection of rows 1 to 3 with columns 4 to 6, …, and "b9" is the intersection of rows 7 to 9 with columns 7 to 9. For example,

:"b2" = {"s14, s15, s16, s24, s25, s26, s34, s35, s36"}.

Finally, we are ready to define a universe "U" and a collection mathcal{S} of subsets of "U" which exactly mirror the required constraints on the placement of values in the squares of the sudoku grid. A solution to the exact cover problem is a collection mathcal{S}^* of disjoint subsets of "U" whose union is exactly "U": each element of "U" appears in exactly one element of mathcal{S}^*. How can this be applied to sudoku? We need to find a set of objects each of which appears exactly once in every finished sudoku puzzle. Every square of the grid must appear exactly once. Every row, column, and square must contain each value exactly once. Each of these constraints involves pairs: pairs of rows and columns, pairs of rows and values, pairs of columns and values, and pairs of blocks and values. Our universe is going to be made up of pairs.

Consider the Cartesian products "R×C", "R×V", "C×V", and "B×V". Each contains 81 pairs. For example "R×C" = {"(r1, c1), ..., (r9, c9)"}.The universe "U" is the 324 element union of these four Cartesian products. As it happens, every valid sudoku solution contains exactly these 324 pairs, no more, no less. But this set of pairs does not represent a specific solution. It represents "every" valid solution to the blank sudoku grid.

To represent a specific solution, we need to assign a specific value to each square. Let "Sijkl" denote the subset of "U" containing "(ri, cj), (ri, vl), (cj, vl), and (bk, vl)". This subset denotes the assignment of value "l" to square "sij". The collection mathcal{S} contains exactly those subsets "Sijkl" of "U" in which square "ij" is an element of block "k", i.e., "sij" ∈ "bk". Since "k" is completely dependent on "i" and "j", there are 9×9×9 = 729 such subsets in mathcal{S}.

An incomplete sudoku grid is represented by a disjoint collection mathcal{S}' of subsets "Sijkl" of "U" which does not specify a value for every square. Since the collection is disjoint (each element of "U" appears in at most one subset), we know it does not violate any of the sudoku constraints.

Now, applying the exact cover problem, we find members of mathcal{S} disjoint from each other and from each member of mathcal{S}', resulting in the collection mathcal{S}^* of 81 disjoint four-element subsets of "U" whose union is exactly the 324 element set "U".

This representation of the sudoku problem eliminates all mention of rows, columns, blocks, and values from the implementation. Instead, we always work with the same collection of 729 four-element subsets of a 324 element universe. These can be represented by the integers 0 to 728, together with a function "f(i,j)" which is true if the subsets corresponding to i and j are disjoint. "f" might be implemented using a 7292 = 531441 element (66431 byte) constant bit vector. A specific puzzle is a set of fewer than 81 integers with "f(i,j)" true for each pair "i,j". Solving the puzzle involves repeatedly finding a new integer "k" which passes "f(i,k)" for each "i" in the partially completed puzzle, and backtracking when no such "k" can be found.

Obviously, once we find that a specific "k" fails "f(i,k)" for some "i", we need not consider it again so long as "i" remains in the proposed solution. So we keep a list of "k" values which have not yet failed. The length of this list estimates the amount of work required to discover that the current proposed solution fails. So at each level of recursion, we can look ahead one level to estimate how much work each proposed "k" will involve, and choose the "k" which returns failure in the shortest time.

Although the size of the complete search tree is fixed for a given puzzle, this representation of the problem gives a fast test for failure at each level, and a way of ordering the search so that the smallest subtrees are searched first.

With this approach and an efficient library for solving exact cover problems, one can solve 9×9 sudokus in such a short time on a modern PC that measuring the computation time becomes challenging.

Solving sudokus by a brute-force algorithm

Some hobbyists have developed computer programs that will solve sudoku puzzles using a brute force algorithm. Although it has been established that approximately 6.67 x 1021 final grids exist, using a brute force algorithm can be a practical method to solve puzzles using a computer program if the code is well designed.

An advantage of this method is that if the puzzle is valid, a solution is guaranteed. There is not a strong relation between the solving time and the degree of difficulty of the puzzle; generating a solution is just a matter of waiting until the algorithm advances to the set of numbers that satisfies the puzzle. The disadvantage of this method is that it may be comparatively slow when compared to computer solution methods modeled after human-style deductive methods.

Briefly, a brute force program would solve a puzzle by placing the digit "1" in the first cell and checking if it is allowed to be there. If there are no violations (checking row, column, and box constraints) then the algorithm advances to the next cell, and places a "1" in that cell. When checking for violations, it is discovered that the "1" is not allowed, so the value is advanced to a "2". If a cell is discovered where none of the 9 digits is allowed, then the algorithm leaves that cell blank and moves back to the previous cell. The value in that cell is then incremented by one. The algorithm is repeated until the allowed value in the 81st cell is discovered. The construction of 81 numbers is parsed to form the 9 x 9 solution matrix.

Most Sudoku puzzles will be solved in just a few seconds with this method, but there are exceptions. The following puzzle was designed to be a near worst case situation for solution by brute force (although it is not regarded as a difficult puzzle when solved by other methods).

Solving this puzzle by brute-force requires a large number of iterations because it has a low number of clues (17), the top row has no clues at all, and the solution has "987654321" as its first row. Thus a brute-force solver will spend an enormous amount of time "counting" upward before it arrives at the final grid which satisfies the puzzle. If one iteration is defined as one attempt to place one value in one cell, then this puzzles requires 641,580,843 iterations to solve. These iterations do not include the work involved at each step to learn if each digit entered is valid or not (required for every iteration). Based on the specific construction of the computer code, programmers have found the solution time for this puzzle to be between 30 and 45 minutes with a computer processor running at 3 GHz. Many programmers have developed variations of the brute force algorithm which will solve this puzzle in a minute or less with a 3 GHz computer processor.

One programmer has created charts of the progression of a pointer as it advances through the 81 positions of a sudoku using a brute force algorithm. An example is the chart for the solution to a sudoku "Star Burst Leo" shown here. [ [http://www.flickr.com/photos/npcomplete/ "flickr.com/photos/npcomplete/"] ] [ [http://www.flickr.com/photos/npcomplete/2384354604/"flickr.com/photos/npcomplete/2384354604/"] " (Star Burst Leo sudoku)"]

Solving sudokus via stochastic search / optimization methods

Some researchers have also shown how sudoku can be solved using stochastic -- i.e. random-based -- search. [Lewis, R (2007) "Metaheuristics Can Solve Sudoku Puzzles" Journal of Heuristics, vol. 13 (4), pp 387-401.]

Such a method could work as follows: first, start by randomly assigning numbers to the blank cells in the grid, and calculate the number of errors. Now start to "shuffle" these inserted numbers around the grid until the number of mistakes has been reduced to zero. A solution to the puzzle will then have been found. Approaches for shuffling the numbers include simulated annealing, and tabu search.

The advantage of this type of method is that the puzzle does not have to be "logic-solvable" in order for the algorithm to be able to solve it. In other words, unlike other methods, the puzzles that are given to this algorithm do not have to be specially constructed so that they provide sufficient clues for filling the grid using forward chaining logic only. In fact, the only prerequisite for the stochastic search algorithm to work is that puzzle has at least one solution.

Stochastic-based optimisation algorithms are known to be quite fast, though they are perhaps not as fast as some logic-based techniques with logic solvable puzzles. Depending on the type of instance given to the algorithm, generally 9x9 puzzles will be solved in less than 1 second on a typical year-2000 lap top; 16x16 puzzles will take around 10-15 seconds.

Finally, note that it is also possible to express a Sudoku as an integer linear programming problem. Such approaches seem to gets close to a solution quite quickly, and can then use branching towards the end. The Simplex Algorithm seems able to handle situations with no solutions or multiple solutions quite well.

Algorithmic Search for Symmetrical Sudokus with Few Givens

Computer algorithms work through increasingly more cycles when searching for sudokus with 20 clues or fewer. Indeed puzzles with 17 clues are notoriously difficult to find. When the constraint of symmetry is applied, the expected search time will dramatically increases yet further. [ [http://www.flickr.com/photos/npcomplete/2599486458/"17 Clue Sudoku with Diagonal Symmetry"] ] When working with 19 clues or fewer, there are some classes of symmetry for which no sudoku has been found, and none may exist. [ [http://www.flickr.com/photos/npcomplete/2896377811/"19 Clue Two-Way Symmetry"] ] The table below shows the seven classes of symmetry and whether one or more sudokus has been discovered for the range of 20 clues and fewer.

(asterisk indicates this number of givens does not have a pattern with this class of symmetry)

Two examples of symmetrical sudokus with a small number of givens are shown here:

olving a blank sudoku grid

Although sudoku grids that come with some of their cells pre-filled can often be quite challenging to solve, blank sudoku grids can actually be solved very quickly. Perhaps the easiest way of doing this is to produce the "root solution", which can be achieved using the following very simple, polynomial time algorithm [Lewis, R (2007) "Metaheuristics Can Solve Sudoku Puzzles" Journal of Heuristics, vol. 13 (4), pp 387-401.]

For the standard n2 x n2 grid this algorithm (java-implementation) is as follows:

final int n = 3;final int [] [] field = new int [n*n] [n*n] ;int x = 0;for(int i = 0; i < n; i++, x++) for(int j = 0; j < n; j++, x+=n) for(int k = 0; k < n*n; k++, x++) field [n*i+j] [k] = (x % (n*n)) + 1;

The above procedure produces the following 9x9 sudoku:

+-----------------------+
1 2 3 | 4 5 6 | 7 8 9
4 5 6 | 7 8 9 | 1 2 3
7 8 9 | 1 2 3 | 4 5 6
-------+-------+-------
2 3 4 | 5 6 7 | 8 9 1
5 6 7 | 8 9 1 | 2 3 4
8 9 1 | 2 3 4 | 5 6 7
-------+-------+-------
3 4 5 | 6 7 8 | 9 1 2
6 7 8 | 9 1 2 | 3 4 5
9 1 2 | 3 4 5 | 6 7 8
+-----------------------+

Standard Sudoku solver and enumerator

This section contains a discussion of a modified backtracking algorithm that can be used to create and solve Sudokus even with modest computing resources. A standard Sudoku is an N x N Sudoku where N = k^2 whose zones consist of the rows, columns and k x k subsquares as in the classic 9 x 9 Sudoku.

Key modifications to the algorithm

Conceptually there is one modification only: the backtracking algorithm sorts the vertices by the number of colors already assigned among its neighbors before selecting the next vertex to try. The vertex with the largest number of assigned colors, and hence, the smallest number of choices, is tried first. (There may be more than one such vertex).

The data structures used during the backtracking search are chosen to make this easy and fast, although further optimization is possible. The search state is stored in three data structures: a hash table whose keys are the vertices and whose values are the colors that have been assigned to them. There is an array that contains the vertices that have not yet been assigned a color. Finally, there is a hash table whose keys are again the vertices and whose values are hash tables containing the colors present among the neighbors of the respective vertex, as well as a hint as to who assigned them.

Discussion of the modified algorithm

The algorithm follows this sequence of steps:

* Create the row and column zones, then create the subsquare zones.
* Construct the adjacency matrix of the graph.
* The search routine:
** Print the solution if there are no more vertices to be assigned a color, and return.
** Sort the remaining vertices in descending order of colors present among their neighbors.
** Pick the/a vertex with the largest number of assigned colors.
** Try each of the remaining possible colors recursively.
*** Update the hash table of vertex neighboring colors to reflect the assignment.
*** Update the partial solution to reflect the assignment.
*** Recurse.
*** Remove the color from the partial solution.
*** Undo the color assignments from the neighboring colors hash table.
* Before the search begins, read the initial color assignment.
* Compute the set of vertices to be assigned a color, i.e. not present in the initial assignment.
* Compute the initial state of the hash table of neighbor colors.
* Start the search.

The above algorithm can enter into loops. To detect this, add a hash table that stores seen configurations. When this happens, terminate the computation and indicate FAIL (e.g. by throwing an exception). Repeat with a different seed of the random number generator if desired.


= Example of a back-tracking Sudoku solver (in Ruby programming language) =

def read_matrix matrix = []

(0..8).each { |i
l = readline matrix [i] = [] (0..8).each { |j
matrix [i] [j] = l [j..j] .to_i } } matrixend

def permissible(matrix, i, j) ok = [true,true,true,true,true,true,true,true,true] # Same as another in the column isn't permissible... (0..8).each { |i2
next if matrix [i2] [j] = 0 ok [matrix [i2] [j] - 1] = false } # Same as another in the row isn't permissible... (0..8).each { |j2
next if matrix [i] [j2] = 0 ok [matrix [i] [j2] - 1] = false } # Same as another in the 3x3 block isn't permissible... igroup = (i / 3) * 3 jgroup = (j / 3) * 3 (igroup..(igroup + 2)).each { |i2
(jgroup..(jgroup + 2)).each { |j2
next if matrix [i2] [j2] = 0 ok [matrix [i2] [j2] - 1] = false } } # Convert to the array format... ret = [] (0..8).each { |i2| ret.push(i2 + 1) if ok [i2] } retend

def deep_copy_sudoku(matrix) newmat = [] (0..8).each { |i
newmat [i] = [] (0..8).each { |j
newmat [i] [j] = matrix [i] [j] } } newmatend

def solve_sudoku(matrix) while true options = [] (0..8).each { |i
(0..8).each { |j
next if matrix [i] [j] != 0 p = permissible(matrix, i, j) # If nothing is permissible, there is no solution at this level. return false if (p.length = 0) options.push({:i => i, :j => j, :permissible => p}) } } # If the matrix is complete, we have a solution... return matrix if options.length = 0

omin = options.min { | a, b
a [:permissible] .length <=> b [:permissible] .length }

# If there is an option with only one solution, set it and re-check permissibility if omin [:permissible] .length = 1 matrix [omin [:i] [omin [:j] = omin [:permissible] [0] next end

# We have two or more choices. We need to search both... omin [:permissible] .each { |v
mtmp = deep_copy_sudoku(matrix) mtmp [omin [:i] [omin [:j] = v ret = solve_sudoku(mtmp) if ret != false return ret end }

# We did an exhaustive search on this branch and nothing worked out. return false endend

def print_matrix(matrix) if (matrix = false) print "Impossible " return end

(0..8).each { |i
(0..8).each { |j
print matrix [i] [j] } print " " }end

print_matrix(solve_sudoku(read_matrix()))

Exceptionally difficult Sudokus (Hardest Sudokus)

Some of the sudoku puzzles can only be solved using logic that is too complex for human solvers. Most would describe them as "unsolvable" after exhausting their arsenal of sudoku solving techniques and would proceed to a "trial and error" path to reach a solution.

It was no surprise that computer programmers were interested in this subject, trying to generate even more difficult puzzles or even trying to find new ways to logically solve and rate them.

Rating puzzles that are beyond human capabilities proved to be difficult as different points of view regarding what could be considered a "yardstick" for measuring difficulty resulted in heterogeneous opinions on which puzzle is the "hardest of them all".

Several active discussions on this issue took place on a number of popular sudoku forums since 2005. [ [http://www.sudoku.com/boards/viewtopic.php?t=4212&start=0 "The hardest sudokus"] "On the Sudoku Players' Forums"] [ [http://www.setbb.com/sudoku/viewtopic.php?t=103&start=0&mforum=sudoku "what is the hardest known suduko ?"] "On the Sudoku Programmers Forums"] [ [http://www.sudoku.com/boards/viewtopic.php?t=5336&start=0 "How regular is to generate sudoku with difficulty 9+ SE?"] "On the Sudoku Players' Forums"] Several openly accessible solver programs became popular between users for the purpose of rating and generating such puzzles.

The following is a compilation of the latest hardest sudoku puzzles according to a number of openly accessible solver Programs:

Rating Program: gsf's sudoku q1 (rating) Rating: 99408 Poster: JPF Label: Easter Monster 1.......2.9.4...5...6...7...5.9.3.......7.......85..4.7.....6...3...9.8...2.....1 1 . . | . . . | . . 2 . 9 . | 4 . . | . 5 . . . 6 | . . . | 7 . . ------+-------+------ . 5 . | 9 . 3 | . . . . . . | . 7 . | . . . . . . | 8 5 . | . 4 . ------+-------+------ 7 . . | . . . | 6 . . . 3 . | . . 9 | . 8 . . . 2 | . . . | . . 1

Rating Program: gsf's sudoku q1 (Processing time) Rating: 4m19s@2 GHz Poster: tarek Label: tarek071223170000-052 ..1..4.......6.3.5...9.....8.....7.3.......285...7.6..3...8...6..92......4...1... . . 1 | . . 4 | . . . . . . | . 6 . | 3 . 5 . . . | 9 . . | . . . ------+-------+------ 8 . . | . . . | 7 . 3 . . . | . . . | . 2 8 5 . . | . 7 . | 6 . . ------+-------+------ 3 . . | . 8 . | . . 6 . . 9 | 2 . . | . . . . 4 . | . . 1 | . . .

Rating Program: Nicolas Juillerat's Sudoku explainer 1.2.1 Rating: 11.9 Poster: tarek Label: golden nugget .......39.....1..5..3.5.8....8.9...6.7...2...1..4.......9.8..5..2....6..4..7..... . . . | . . . | . 3 9 . . . | . . 1 | . . 5 . . 3 | . 5 . | 8 . . ------+-------+------ . . 8 | . 9 . | . . 6 . 7 . | . . 2 | . . . 1 . . | 4 . . | . . . ------+-------+------ . . 9 | . 8 . | . 5 . . 2 . | . . . | 6 . . 4 . . | 7 . . | . . .

Rating Program: dukuso's suexrat9 Rating: 4483 Poster: coloin Label: col-02-08-071 .2.4.37.........32........4.4.2...7.8...5.........1...5.....9...3.9....7..1..86.. . 2 . | 4 . 3 | 7 . . . . . | . . . | . 3 2 . . . | . . . | . . 4 ------+-------+------ . 4 . | 2 . . | . 7 . 8 . . | . 5 . | . . . . . . | . . 1 | . . . ------+-------+------ 5 . . | . . . | 9 . . . 3 . | 9 . . | . . 7 . . 1 | . . 8 | 6 . .

Rating Program: dukuso's suexratt (10000 2 option) Rating: 2141 Poster: tarek Label: golden nugget .......39.....1..5..3.5.8....8.9...6.7...2...1..4.......9.8..5..2....6..4..7..... . . . | . . . | . 3 9 . . . | . . 1 | . . 5 . . 3 | . 5 . | 8 . . ------+-------+------ . . 8 | . 9 . | . . 6 . 7 . | . . 2 | . . . 1 . . | 4 . . | . . . ------+-------+------ . . 9 | . 8 . | . 5 . . 2 . | . . . | 6 . . 4 . . | 7 . . | . . .

References

External links

* http://diuf.unifr.ch/people/juillera/Sudoku/Sudoku.html "Sudoku Explainer by Nicolas Juillerat" (Popular for rating sudokus in general)
* http://www.research.att.com/~gsf/man/man1/sudoku.html "sudoku by gsf" (Popular for rating the hardest sudokus amongst other things)
* http://magictour.free.fr/sudoku.htm"suexrat9 by dukuso" (Popular for rating the hardest sudokus)
* http://magictour.free.fr/suexratt.exe "suexratt by dukuso" (Popular for rating the hardest sudokus)


Wikimedia Foundation. 2010.

Look at other dictionaries:

  • Sudoku — proposé par la presse. Le sudoku (prononcé soudokou en français, / …   Wikipédia en Français

  • Sudoku — Not to be confused with Sodoku. A Sudoku puzzle …   Wikipedia

  • Mathematics of Sudoku — The class of Sudoku puzzles consists of a partially completed row column grid of cells partitioned into N regions each of size N cells, to be filled in using a prescribed set of N distinct symbols (typically the numbers {1, ..., N}), so that each …   Wikipedia

  • Sodoku (jeu) — Sudoku Sudoku proposé par la presse. Le sudoku (prononcé / …   Wikipédia en Français

  • Su Doku — Sudoku Sudoku proposé par la presse. Le sudoku (prononcé / …   Wikipédia en Français

  • Su do ku — Sudoku Sudoku proposé par la presse. Le sudoku (prononcé / …   Wikipédia en Français

  • Su doku — Sudoku Sudoku proposé par la presse. Le sudoku (prononcé / …   Wikipédia en Français

  • Sudokus — Sudoku Sudoku proposé par la presse. Le sudoku (prononcé / …   Wikipédia en Français