Lottery scratch cards and crosswords

In summary

I noticed that the Irish National Lottery sells a scratch card with the potential for an information leakage flaw. I whiled away a pleasant Sunday afternoon investigating, and decided that the card in fact has no flaw. Though a little disappointed at this null result, my investigations netted me a cool €17 tax-free profit. In fact so many of the cards I bought were winners that it became rather embarrassing cashing them in.

The data I gathered from the 26 scratch cards I bought is available here.

The Crossword Doubler

Here is the scratch card that caught my eye:

At the time of purchase, the panel of 20 letters labelled "your letters" is hidden but everything else, including the crossword beneath, is visible. The idea is that you scratch off the panel to reveal "your letters", then count how many of the words in the crossword can be spelled using just these letters. If this count is three words or more, you win money. The card above yielded a payout of €5 via the words "latch", "all", "okay".

The card is produced by Premier Lotteries Ireland on behalf of the Irish National lottery. They advertise it here.

Mohan Srivastava

The card caught my eye because years ago I read about Mohan Srivastava. He famously noticed a flaw in some North American scratch cards that made it possible to identify winning cards with a high degree of accuracy, without scratching them.

The characteristic property of the cards that Srivastava made famous is that the conditions for a card to win involved a combination of:

The flaw was that the visible numbers revealed information about the hidden numbers.

It occurred to me that the Irish National Lottery's crossword-based scratch card had a similar structure, and that there were obvious algorithms for producing the cards which would leak information about the hidden letters.

The crossword parameters

Though the layout varies for each card, the crosswords always contain 19 words, the sum of whose lengths is 97 letters. Furthermore, they always contain the same number of words of each length, according to the following table:
Word length3456789
Frequency4533121

The parameters are well-chosen

Let's call the 20 letters labelled "your letters" the "good" letters, and call a word "good" if it can be spelled using only good letters.

Here are sketches of two algorithms that could be used to generate Crossword Doubler instances:

  1. Select the 20 good letters for this card. Based on the desired payout for this card, decide how many good words the card should have. Randomly choose this many good words from a dictionary. Randomly choose the remaining words, ensuring each one contains at least one bad letter. Restart from scratch if we get stuck at any point.
  2. Defer the decision as to which letters are good or bad, and begin by randomly generating a complete crossword. For each possible payout, verify that there exists a choice of 20 good letters that achieve this payout for this crossword. Reject and restart from scratch if not. Look up the desired payout, and select an appropriate set of 20 good letters.
Algorithm 2 is sound, but algorithm 1 leaks information. For example it could occasionally generate a card for which it possible to be certain it does not pay out above a certain threshold or, conversely, guarantees a win. However, the room for possible biases to matter depends on the parameters of the game. These turn out to limit possible information leakage significantly.

Empirically it is not easy to generate flawed crosswords. Indeed we can associate a bipartite graph to a crossword instance where the two sets of nodes are the letters and the words, with the obvious edge relation. Even at this level of abstraction, where the constraint to be able to realise the graph as a crossword of English words is absent, it is still not so easy to generate graphs that leak significant information (though of course it is possible).

The crossword in the card pictured above is fairly typical. If we take each of the ${26}\choose{20}$ possible choices of good letters, and see how many of the 19 words in the crossword are good, we get the following table:

Good words0123456789
Frequency372492971847323917265783157731450268572067015638
Good words10111213141516171819
Frequency10442606631041523620221581410
I.e., there are 23917 different ways to choose 20 good letters which would result in exactly 3 good words for that card.

Of course there are also many other possible biases, and biased algorithms, e.g., biases in letter frequencies, algorithms which make it more likely that shorter/longer words are good, or for which the bonus word leaks information etc. I considered a number of these but identified no flaw.

The game has rather well-chosen parameters, and besides, it may well use a robust algortihm like 2 above.

Final words

Though I got a null result, I did notice a few rules that the cards appear to obey. For example:

These and other observations are expressed in a script contained in this repo together with all the data from the cards I bought. Perhaps the reader might use this data to spot something I missed?