Lottery scratch cards and crosswords
In summary
I noticed that the Irish National Lottery sells a scratch card with the
potential for an information leakage flaw. I whiled away a pleasant Sunday
afternoon investigating, and decided that the card in fact has no flaw.
Though a little disappointed at this null result, my investigations
netted me a cool €17 tax-free profit. In fact so many of the cards I
bought were winners that it became rather embarrassing cashing them in.
The data I gathered from the 26 scratch cards I bought is available
here.
The Crossword Doubler
Here is the scratch card that caught my eye:
At the time of purchase, the panel of 20 letters labelled "your letters" is
hidden but everything else, including the crossword beneath, is visible.
The idea is that you scratch off the panel to reveal "your letters", then
count how many of the words in the crossword can be spelled using just
these letters. If this count is three words or more, you win money. The card
above yielded a payout of €5 via the words "latch", "all", "okay".
The card is produced by Premier Lotteries Ireland on behalf of the Irish
National lottery. They advertise it
here.
Mohan Srivastava
The card caught my eye because years ago I read about Mohan Srivastava. He
famously noticed a flaw
in some North American scratch cards that made it possible to identify
winning cards with a high degree of accuracy, without scratching them.
The characteristic property of the cards that Srivastava made famous
is that the conditions for a card to win involved a combination of:
- pseudo-random numbers, visible before scratching,
- pseudo-random numbers, hidden but revealed after scratching.
The flaw was that the visible numbers revealed information about the
hidden numbers.
It occurred to me that the Irish National
Lottery's crossword-based scratch card had a similar structure, and that
there were obvious algorithms for producing the cards which would
leak information about the hidden letters.
The crossword parameters
Though the layout varies for each card, the crosswords always contain
19 words, the sum of whose lengths is 97 letters. Furthermore, they
always contain the same number of words of each length, according to the
following table:
Word length | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Frequency | 4 | 5 | 3 | 3 | 1 | 2 | 1 |
The parameters are well-chosen
Let's call the 20 letters labelled "your letters" the "good" letters, and
call a word "good" if it can be spelled using only good letters.
Here are sketches of two algorithms that could be used to generate
Crossword Doubler instances:
- Select the 20 good letters for this card. Based on the desired payout
for this card, decide how many good words the card should have. Randomly
choose this many good words from a dictionary. Randomly choose
the remaining words, ensuring each one contains at least one bad
letter. Restart from scratch if we get stuck at any point.
- Defer the decision as to which letters are good or bad, and begin by
randomly generating a complete crossword. For each possible
payout, verify that there exists a choice of 20 good letters that
achieve this payout for this crossword. Reject and restart from scratch
if not. Look up the desired payout, and select an appropriate set
of 20 good letters.
Algorithm 2 is sound, but algorithm 1 leaks information. For example it
could occasionally generate a card for which it possible to be certain
it does not pay out above a certain threshold or, conversely, guarantees a
win.
However, the room for possible biases to matter depends on the parameters
of the game. These turn out to limit possible information leakage
significantly.
Empirically it is not easy to generate flawed crosswords.
Indeed we can associate a bipartite graph to a
crossword instance where the two sets of nodes are the letters and the
words, with the obvious edge relation. Even at this level of abstraction,
where the constraint to be able to realise the graph as a crossword of English
words is absent, it is still
not so easy to generate graphs that leak significant information
(though of course it is possible).
The crossword in the card pictured above is fairly typical. If we take each
of the ${26}\choose{20}$ possible choices of good letters, and see how many
of the 19 words in the crossword are good, we get the following table:
Good words | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 |
Frequency | 3724 | 9297 | 18473 | 23917 | 26578 | 31577 | 31450 | 26857 | 20670 | 15638 |
Good words | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
Frequency | 10442 | 6066 | 3104 | 1523 | 620 | 221 | 58 | 14 | 1 | 0 |
I.e., there are 23917 different ways to choose 20 good letters which would
result in exactly 3 good words for that card.
Of course there are also many other possible biases, and biased algorithms,
e.g., biases in letter frequencies, algorithms
which make it more likely that shorter/longer words are good, or for which the
bonus word leaks information etc. I considered a number of these but
identified no flaw.
The game has rather well-chosen parameters, and besides,
it may well use a robust algortihm like 2 above.
Final words
Though I got a null result, I did notice a few rules that the cards
appear to obey. For example:
- the doubling letter is always good,
- the six-letter bonus word always contains five unique letters,
- the doubling letter is never in the bonus word,
- none of the 26 cards I bought had a crossword using more than 24
letters or fewer than 22.
These and other observations are expressed in a script contained
in this repo
together with all the data from the cards I bought. Perhaps the reader
might use this data to spot something I missed?