Twenty-one girls and twenty-one boys took part in a mathematical contest.Results available here indicate this is not completely trivial: just 4% of contestants were awarded full marks. (I suggest we ignore the fact that it took me eighteen years to come up with the proof discussed below, versus the c.90 minutes available to contestants.)Prove that there was a problem that was solved by at least three girls and at least three boys.

- Each contestant solved at most six problems.
- For each girl and each boy, at least one problem was solved by both of them.

One reason I like this question is that it follows trivially from the following result which, once stated, is easy to prove. Specifically:

Given an $r \times c$ matrix of natural numbers such that each row contains at most $n$ distinct values and each column contains at most $m$ distinct values, there exists a cell whose value appears at least $k$ times, in the row to which it belongs as well as at least $k$ times in the column to which it belongs, where: $$ k = \left\lceil\frac{cr}{cm + rn - \min(cm, rn, c+r)}\right\rceil. $$Let's call this the two-dimensional pigeonhole principle. My proof of it in Coq is available here.

The original IMO question follows from the above with $c = r = 21$ and $m = n = 6$, by factoring it through the following construction. Given an instance of the original IMO question, first, number all the problems. Next construct a $21\times 21$ matrix with rows corresponding to girls, and columns corresponding to boys. A cell in this matrix thus corresponds to a girl-boy pair, and we fill in a value by choosing any one of the problems that they both solved, and writing its number in this cell.

Consider a single column; since $k \le \lceil r/m\rceil$, the usual (one-dimensional) pigeonhole principle tells us that this column contains at least one column-good cell, and thus at most $m-1$ of the column's values appear in column-bad cells. Since each of these bad values appears in at most $k-1$ of the column's cells, the column contains at most $(m-1)(k-1)$ column-bad cells. Since this is true for each column, the matrix contains at most $c(m-1)(k-1)$ column-bad cells.

Similarly the matrix contains at most $r(n-1)(k-1)$ row-bad cells, and thus at most: $$ B = c(m-1)(k-1) + r(n-1)(k-1) $$ cells that are either column-bad or row-bad. Since $$ k - 1 < \frac{cr}{c(m-1) + r(n-1)}, $$ we must have $B < cr$ and thus there exists a cell that is both row-good and column-good, as required.

(In fact the above generalises trivially to show that sufficient conditions for there to exist a cell whose value occurs at least $k$ times in its column as well as at least $l$ times in its row are: $$ \max\left(\frac{m}{r}(k-1), \frac{n}{c}(l-1), \frac{m-1}{r}(k-1) + \frac{n-1}{c}(l-1)\right) < 1. $$ Even the Coq proof could be easily extend to the above.)

The proof aside, I think there is value in staring at just a statement of the result:

` ````
Section Php2dStatement.
Variables (r : nat) (M : matrix nat r).
Definition nunique (l : list nat) := length (nodup Nat.eq_dec l).
Definition min3 (x y z : nat) := min x (min y z).
Definition lmax := fold_right max 0.
Definition freq (l : list nat) (x : nat) := count_occ Nat.eq_dec l x.
Definition c := col_count M.
Definition m := lmax (map nunique (columns M)).
Definition n := lmax (map nunique (rows M)).
Definition k := ceil (r*c) (c*m + r*n - min3 (c*m) (r*n) (c+r)).
Theorem php_2d : r <> 0 -> c <> 0 -> exists x v w i j,
nth_error (rows M) i = Some v /\
nth_error (columns M) j = Some w /\
nth_error w i = Some x /\
nth_error v j = Some x /\
k <= freq v x /\
k <= freq w x.
End Php2dStatement.
```

The only terms above that are not part of Coq or its standard library
are the ceil function, and
those associated with the matrix type, i.e.,
matrix,
columns,
rows. These are all easily defined.
For me, the main takeaways from this exercise were:

- Proving things in Coq is slow work.
- Proving things in Coq is fun and highly addictive!
- It is striking how many combinatorial results that seem obvious or trivial require non-trivial effort to prove formally.
- Coq's standard library is wonderful but has a lot of gaps. I hope to fill in a few in the near future since Coq is open source and welcomes pull requests.
- I am even more in awe of things like the formal proof of Feit-Thompson.
- I am even more excited by initiatives such as The Xena Project and The Formal Abstracts Project.
- I am even more eager to try using Lean.

In the meantime, there is of course a trivial generalisation by dimension. Suppose we have a $d$-dimensional cube of natural numbers of shape $a_1 \times a_2 \times \cdots \times a_d$, and for $1 \le i \le d$, any "column" parallel to the $i$th axis contains at most $n_i$ unique values. Let: $$ k = \min\left( \left\lceil\frac{a_1}{n_1}\right\rceil, \ldots, \left\lceil\frac{a_d}{n_d}\right\rceil, \left\lceil\frac{a_1a_2\cdots a_d}{a_2a_3\cdots a_d(n_1-1) + \cdots + a_1a_2a_{d-1}(n_d-1)}\right\rceil \right), $$ then there exists a cell whose value appears at least $k$ times in each of the $d$ "columns" to which it belongs.

In the generic case, the smallest value in the $\min$ expression for $k$ is the final argument; this may be simplified to: $$ k = \left\lceil\left(\sum_i \frac{n_i - 1}{a_i}\right)^{-1}\right\rceil. $$ I think it might make a good question to pose a carefully stated version of this result where we consider a cube of shape: $$ 100 \times 400 \times 900 \times \cdots \times (10d)^2, $$ and: $$ n_1 = n_2 = \cdots = n_d = 21, $$ and request proof of a value appearing at least 4 times in each dimension. I like this because it meets the genericity conditon and as $d$ grows, $k$ tends to: $$ \left\lceil \frac{30}{\pi^2}\right\rceil $$ and since $30/\pi^2 = 3.03\cdots $, it's fairly tight.

- There exists a value occuring at least $\lceil a/n\rceil$ times.
- There exists a value occuring at most $\lfloor a/n\rfloor$ times.

(Ensure local storage is enabled to save results.)