Can prime numbers be written as the sum of two squares? 13 can for sure (2² + 3²), but for example 3 can’t. Pierre de Fermat came up with a theorem on this topic, for which Don Zagier, an american mathematician gave a proof, which astonishingly is just one sentence long.
Numberphile   made a great video on the one-sentence proof of Zagier, but left out some details. I’ll give an overview over the points already covered in the video of Numberphile but will also explain the left out steps.
The Problem (Theorem)
Pierre de Fermat, a French mathematician of the seventeenth century, thought about under which circumstances primes could be written has the sum of two squares. For example 13 is a prime, which can be written as the sum of two squares, namely 3² and 2². Moreover 17 is a prime and can be written as the sum of 4² and 1². In general Fermat was interested in the question, whether or not for a given prime number, let’s call it p, two other arbitrary numbers, let’s call them a and b, can be found, such that p = a² + b².
The Problem can somehow be imagined as a game: You give me a prime number p. My task is it to tell you whether or not this prime is expressible as the sum of two squares. One possibility to decide whether the answer is “yes” or “no”, is to test all possible combinations for a and b. So, if you give me the prime number 3, I would go through all combinations: 1²+1², 1²+2². As I shot over 3 with 1²+2², I would come to the conclusion that 3 cannot be written as the sum of two squares.
This method is feasible for small prime numbers. However even computers struggle to play the game with this method for very large prime numbers. So it would be much nicer, to have a better strategy. Luckily Fermat wrote down the first hundred prime numbers for which the answer is “yes” and found a surprising pattern. It seemed that exactly those prime numbers p can be written as the sum of two squares, for which p – 1 is divisible by 4. If we check this conjecture with our previous examples 17 and 13, the hypothesis seems to hold. So the theorem, which is proofed in the following paragraphs is:
A prime number p is expressible as the sum of two squares if and only if, p – 1 is divisible by 4.
Unfortunately Fermat did not proof it and also other mathematicians struggled to come up with a proof. About 50 years ago Don Zagier gave a proof, not the first one, but unbelievably in just one sentence. I’ll give a detailed version of the famous one-sentence proof to illustrate all thought processes and all formal checks needed to understand the proof. The original proof will be given after that. In the next few paragraphs “p” is a prime number which satisfies the properties demanded by the theorem. So p is a prime and p – 1 is divisible by 4, which means p – 1 = 4k or p = 4k – 1 for some number k. Our task is it to show, from this assumptions, that there indeed are some numbers “a” and “b” such that p = a² + b².
The ingenious idea
Zagier starts his proof with looking at the solutions of the equation p = x² + 4yz. In the original proof you’ll find . This is just fancy math notation for the set of all solutions. So “S” is the set of all triples, containing numbers “x”, “y” and “z” for which the equation is true.
Don’t puzzle your head over how Zagier came up with the idea to look at exactly this equation. If this step is logical to you and you would have come up with the exact same idea, you probably have a promising career in mathematics ahead.
Recipes for new solutions
In the equation p = x² + 4yz it can be seen that from a given solution (x,y,z), a new solution can be obtained by interchanging “y” and “z”. This is called an involution and noted as (x,y,z) → (x,z,y). It is also apparent that if this process is applied again, one will get back the old solution. This means that the solutions come in pairs. However it could also be that if the involution – the recipe for new solutions – is applied nothing changes, namely if y is equal to z. Now imagine that there is such a solution where “y” is equal to “z”. This would mean we could write p = x² + 4yy or p = x²+(2y)² and voilà, we have written p as the sum of two squares.
The argument above illustrates that the whole question “Is p expressible as the sum of two squares?” boils down to the question “Does (x,y,z) → (x,z,y) have a fixed point (point where nothing changes) on the set S?”.
This is the point where they take a “leap of faith”, as they say, in the Numberphile video. In the video they assume that S contains an odd number of solutions. From this follows that there is a solution (x,y,y), because if for all solutions “y” and “z” were different, the number of solutions would be even. This is follows from the fact that the involution (x,y,z) → (x,z,y) produces the solution in pairs. So because S contains an odd number of solutions, there is a solution (x,y,y) and therefore p is expressible as the sum of two squares.
But why is |S| odd?
If you followed the sketch of the proof to this point, you have almost understood everything there is to understand the original proof. Whats left to show is that the number of solutions in “S” is indeed odd.
For this matter Zagier has another clever trick up his sleeve. He gives another involution on S, which looks slightly more complicated.
As the first involution, also this one produces the solutions in pairs. Moreover this involution has exactly one fixed point. However these properties are not obvious and neither is the question, if it is a valid involution at all. With the involution (x,y,z) → (x,z,y) this was quite clear as you cannot mess up the solution by changing y and z. Fortunately this properties can be checked with just a little school algebra.
Is it a valid involution?
To check this, we simply insert the new solution obtained by the involution into the equation and simplify.
If (x,y,z) is a solution with the property of the first case, the involution gives us the new solution (x+2z, z, y-x-z). Inserting and applying basic algebra proofs that it is really a valid solution.
So, if the first case of the involution has to be applied we indeed get solution to the equation. The same concept proofs this for the other two cases.
Do the solutions come in pairs?
Now, that we now that the involution is really valid, the next step is to proof that the solutions are generated in pairs. In other words, if we apply the involution to a solution and after that apply the involution again, we get back to the solution we started with. Again the property needs to be proofed case by case.
Starting with a solution (x,y,z) which satisfies the condition for the first case, we get a new solution (x+2z, z, y-x-z). This new solution case three has to be applied and we indeed get back to (x,y,z).
For a (x,y,z) in the second case, we get the solution (2y-x, y, x-y+z), which again has the properties of the second case. Applying the involution again produces (x,y,z).
And as desired starting with (x,y,z) in the third case brings us (x-2y, x-y+z, y), which satisfies the condition of the first case. After a second application, we also in this case get back to (x,y,z).
We now have almost everything together to know that the number of solutions to p=x²+4yz is odd. At this point we know that the solutions can be generated in pairs from the involution (x,y,z) → (x,z,y) but also from the more complicated one given above.
How many fixed points are there?
Zagier came up with the more complicated involution not to make this post longer, but because it’s easier to investigate the fixed points of the more complicated looking one. When considering fixed points, we want to know for which solutions the involution spits out the exact same solution we put in. This does not violate the property that the involution generates solutions in pairs, as applying the involution again will once more give back to solution we have begun with.
Note that in the previous step we saw that only in case two of the involution, it spat out a solution which had the same properties as the solution we stuffed in. So if there is a fixed point (and we really want one), it has to be a solution which satisfies the conditions of case two.
Basic algebra reveals that for a fixed point y has to be equal to x.
But x has to be 1, because otherwise p – 1 would not be divisible by 4. Therefore also y has to be 1.
As p is our fixed prime and x and y are fixed, also z has to be fixed. We can put everything we know into an equation and solve for z.
So for our given prime p, there is exactly one fixed point on our involution. So as we have seen the involution produces the solutions in pairs, where the solutions in the pairs are different. Only for one solution this does not hold and the same solution is produced. In other words, the involution generates an even number of solutions (the pairs) and one extra solution (the fixed point). That means that in the end we end up with an odd number of solutions for the equation p = x²+4yz. Which, as described above, is sufficient to prove that p is expressible as the sum of the squares.
The original proof
As to this point all pieces of the once-sentence proof have been given, let’s have a look at the original proof of Zagier which unbelievably is really just one sentence.
The involution on the finite set defined by
has exactly one fixed point, so |S| is odd and the involution defined by (x,y,z) → (x,z,y) also has a fixed point.