-Charles, Charles! Wait a minute! – Paula shouted through the window of the first floor of the institute – Notice: we are going to the SUPERSTAR concert.
Carlos looked at Paula skeptically.
―Paula, sorry, the concert was cancelled. Some duplicate tickets, or they exceeded the capacity… I don’t know.
“Exactly!” Paula shouted. It has been postponed until next week. And in the Technology Classroom they have launched a contest. You have to answer three questions and the prize is two tickets!
Carlos loved SUPERSTAR, so he sat patiently while Paula excitedly unleashed a torrent of information.
―The concert was suspended because they had several problems with ticket validation. You see, each input has a number, and to validate them faster, that number was summarized through a transformation called a function hash or summary function. At the gate of the stadium they had a reader who compared the summary of each entry with the list of valid summaries, to authorize or not the passage.
“I’m starting to get lazy…” Carlos protested. “What is this summary function?”
Paula went on relentlessly. ―A hash function is a mathematical function that takes a value, say a string of numbers of any length, such as the number of an input, and returns a hash, that is, a short sequence of numbers. This process is irreversible, that is, if they give you the summary you cannot go back and recover the starting sequence. On the other hand, with the same input, the function always returns the same output. Paula painted two circles with dots and arrows on the ground of the ride, representing what she was explaining.
These summaries are used for many things. For example, to detect and correct errors when transcribing a large number; that is precisely the role of the letter of our DNI, which is a summary of the digits that precede it. They also serve to detect if someone adds a virus to a program that you download from the Internet: you can check if the summary of the code that you have downloaded matches the one published on the website of the person who offers you the program. If not, someone has modified the file while it was traveling over the network.
“And that’s not good…” Carlos interrupted.
-Nope. Bad signal. Other times, the summaries are used to, for example, speed up processes, imagine that instead of calling us at the institute using our entire enrollment number, they used only the last two digits.
“Well, what a mess they would make,” Carlos jumped in, “because I’m sure there are several of us who have those two digits the same.”
“That’s right!” Paula said, “If the summary function is bad, there are coincidences, which are called collisions, and they shouldn’t be used… that’s exactly what happened to those at the concert!” I’ll explain it to you, it’s quite simple. Each entry had an identifying number, the summary of which was determined using the CUTREHASH function, which divides the original number by 1,024 and takes the remainder of that division as the summary.
“And why just 1,024?”
“Ummm. That is not relevant to solving the problem, but it will be of interest to those who remember binary sequences. As validating the inputs has to be fast, both numbers are represented in binary (with ones and zeros), so that they can be easily read with an optical reader. Since 1,024 is 2 raised to 10, we ensure that the summary number can be represented with at most 10 figures (10 bits).
“I don’t see the problem,” said Carlos, “they had everything very well thought out.”
―Well… not quite… —Paula smiled— to begin with, the capacity of the stadium was 10,900 people, so the correct starting identifiers should be numbers between 1 and 10,900. But with the chosen formula there are far fewer summaries than numbers of entries, so there couldn’t be a different summary for each identifier! Do you remember the Pigeonhole Principle that we saw in class? If you have more pigeons than lofts, some have to share a bed…and that’s exactly what happened to them. At one point, people began to arrive with valid tickets and summaries that had already been validated by the door reader, thereby denying them access.
“My goodness, what a disaster,” Carlos seemed worried, “and did it take them long to realize it?”
“Not really, Carlos, think about it… That’s the first question of the contest.” How many people at most and at least had entered the stadium when the first person with a repeated summary arrived?
– Come on, I think about it. But, if it didn’t take them that long to realize it, they could have held the concert a little later, right?
“They tried to do it. Verifying that using only the summary was a bad method, they decided to validate the inputs in another way, checking two things: that the identifier had not been used before and that its summary was correctly calculated (thus using the hash to correct possible printing errors). But it didn’t work either. Some crooks, aware of the defective validation system, had falsified thousands of entries with their corresponding summaries, from the one with the identifier 10,901, to the one numbered 99,999. And they only realized that when the attendees clearly overflowed the capacity!
And now come the rest of the challenge questions. The second one is easy: Could you find the number of a valid entry whose summary gives the same value as the entry with identifier 4,410? And the third, more complicated one: How many valid tickets and how many in total, counting the forged ones, did they have that 4,410 as a summary? We have to hurry, the prize is two good tickets!
“Well, let’s go!” Now Carlos was convinced. “We’re going to go to the SUPER STAR fixed concert!”
Crypto challenges will be published every 15 days. Readers can leave their solutions and discuss the problem in the comments on this page, so anyone who wants to solve it on their own is advised not to read it until they have cracked the puzzle. You can also email your responses [email protected]. In each new challenge we will publish the solution of the previous one, accompanied by a comment with some original or inspiring ideas that we have received.
Ana Isabel González Tablas Ferreres, @aigtf, is a PhD in Computer Science and professor and researcher of the COSEC group of the Carlos III University of Madrid.
SOLUTION TO THE PREVIOUS CHALLENGE
in the challenge How to fool a sudoku checker, the essential thing in this challenge is to observe the following: if three equal tiles are stacked in each cell, as long as the sudoku is not well solved, the verifier will always detect the problem. The only possibility to fool the verifier is therefore to stack tiles with different numbers.
The proposed sudoku is completed like this:
Therefore, if we knew that the verifier checks first by rows, then by columns and lastly by boxes, one of the many ways to trick it would be:
Where we suppose the chips are stacked and to the left appears the number of the first (lower) chip, in the middle that of the second and to the right that of the upper layer. We have placed three identical tiles in the fixed cells, since they are face up, while with the tiles face down, we have limited ourselves to placing, correlatively, the lowest available tiles, first by rows, then by columns and finally by boxes ( left to right and top to bottom). Some readers (such as Javier and Héctor) have pointed out to us that since there are three checks (rows, columns and boxes) that will be done sequentially and seeing that it is trivial to fill in the sudoku so that there are no repetitions in each of them, the probability of providing a solution that the verifier considers valid when we only have proposals that pass a test (the one with rows, the one with columns and the one with boxes) coincides with the probability of getting the order in which the verification will be done right. This probability is 1/6 (one divided by the number of permutations of three elements).
The cryptographic tool that underlies this challenge are the so-called Zero Knowledge Proofs, known by the acronym ZKP (Zero Knowledge Proof), which were introduced in 1985 by Shafi Goldwasser, Silvio Micali and Charles Rackoff (see The Knowledge Complexity of Interactive Proof-Systems). Informally, a Zero Knowledge Proof is a procedure through which a “prover” convinces a “verifier” of a fact without revealing more information than the truth of said fact. These types of protocols are tremendously useful in cryptography, since, for example, they allow the possession of secret keys to be demonstrated (thus identifying the users of a system) without leaking information about them. Currently, research on ZKPs is booming due to the expansion of cryptocurrencies. In this context, the ZKPs are essential to validate transactions in order to avoid fraud without revealing the movements of an account, thus protecting the privacy of system users.
Our Sudoku example is inspired by Gradwohl, Naor, Pikas y Rothblum (2007), where, in addition to proposing and reviewing other ZKPs for solving Sudokus, the authors analyze in detail a variation of the proposed method in which the verifier randomly selects a card from each box each time that cell intervenes in the verification by rows, columns or boxes. With this verification protocol, which appears in the solution proposed by a reader (David), it can be shown that the probability of fooling a checker with a wrong solution it cannot be greater than 1/9.