Polynomials are Good Functions.

In this blog post we present some basic quantitative results that show the polynomials of a fixed degree are good and predictable functions. It is also illustrated with two Olympiad problems.
Let n be a fixed natural number. We consider the set L_n of all polynomials with real coefficients and degree at most n. Suppose, we know some information about a polynomial P\in L_n. Then we can predict some other properties of P. For example, if the magnitude of P is small in [-1,1] then it cannot become too large in a slightly expanded interval, say in [-2,2]. Its slope cannot become abruptly too steep. That is, by knowing some local properties of P we can predict its behavior in nearby neighborhood. The main reason why it happens is that any polynomial in L_n has finite number degrees of freedom. It is fully determined by finite and fixed number of parameters, for example by its coefficients. Below are some quantitative properties in this spirit.

Claim 1. (Markov’s inequality). Let P be a polynomial of degree n. Then

\displaystyle \max_{x\in[-1,1]} |P'(x)|\le n^2\max_{x\in [-1,1]}|P(x)|.

That is, knowing that |P(x)| is not large in [-1,1] implies |P'(x)| also cannot be too large in the same interval.

The following inequality is a generalization for the derivatives of higher order and is also attributed to Markov brothers.

Claim 2. (Markov brothers’ inequality). For any polynomial P of degree n it holds

\displaystyle \max_{x\in[-1,1]}\left|P^{(k)}(x)\right| \le \frac{n^2(n^2-1^2)\cdots (n^2-(k-1)^2)}{1\cdot 3\cdot 5\cdots (2k-1)}\max_{x\in[-1,1]}\left|P(x)\right|

for k=1,2,\dots,n.

Claim 3. Let P be a polynomial of degree n and we know the behavior of P(x) in the interval [-1,1]. Then, we can predict the growth rate of of P outside this interval. It holds

\displaystyle \left|P(x)\right|\leq \left|T_n(x)\right|\cdot \max_{t\in[-1,1]}\left|P(t)\right|, \forall x\notin [-1,1].

where T_n(x) is the Chebyshev polynomial of first kind. This is a famous extremal property of the Chebyshev polynomials. One can see a short proof of this fact in this blog post, as well as some of its applications.

Claim 4. For any polynomial P(x)=a_0x^n+a_1x^{n-1}+\dots a_{n-1}x+a_n we have

\displaystyle |a_0|+|a_1|+\dots+|a_n|\le C\max_{x\in[-1,1]} |P(x)|\qquad(1)

where C is a constant depending only on n.

Proof. There are many possible approaches. We can use Markov brothers’ inequality. We also can use the Langrange interpolation formula for P at some n+1 fixed knots, for example equally spaced in [-1,1], and then estimate the coefficients. I prefer a rather abstract approach, but it shows the ultimate reason why it holds.
Consider the vector space L_n of all polynomials of degree at most n. It can be normed by any of the following two norms

\displaystyle \|p\|_1:= |a_0|+|a_1|+\dots+|a_n|\,;\, \|p\|_2:=\max_{x\in[-1,1]}|p(x)|

where p\in L_n, p(x)=a_0x^n+a_1x^{n-1}+\dots a_{n-1}x+a_n. Since L_n is a finite, n+1 dimensional vector space, any two norms are equivalent. It means there exist constants C_1,C_2>0 depending eventually only on n so that

\displaystyle C_1\|p\|_2\le \|p\|_1\le C_2\|p\|_2,\forall p\in L_n

which proves (1). Of course, the interval [-1,1] is not important, it can be replaced by any other interval, for example [0,1].

Claim 5. For any polynomial P(x)=a_0x^n+a_1x^{n-1}+\dots a_{n-1}x+a_n we have

\displaystyle |a_0|+|a_1|+\dots+|a_n|\le C\int_{-1}^1 |P(x)|\,dx\qquad(2)

where C is a constant depending only on n.

Proof. The same approach, by considering the norm \displaystyle \|p\|_3:=\int_{-1}^1 |P(x)|\,dx.

Two applications.

Problem 1 (Putnam 1995, A5). Prove that there is a constant C such that, if p(x) is a polynomial of degree 1999, then \displaystyle |p(0)|\leq C\int_{-1}^1|p(x)|\,dx.
Solution. Straightforward, by Claim 5.

Problem 2. (Romanian NMO). Let f :\mathbb R \to\mathbb R be a n \geq 2 times differentiable function so that:

\displaystyle \lim_{x \to \infty} f(x) = l \in \mathbb R and \displaystyle  \lim_{x \to \infty} f^{(n)}(x) = 0.

Prove that: \displaystyle \lim_{x \to \infty} f^{(k)}(x) = 0 for k =1, 2, \dots, n - 1 , where f^{(k)} is the k -th derivative of f..

Solution. It’s natural to use Taylor’s expansion.

\displaystyle f(a+t)=f(a)+f'(a)\frac{t}{1!}+\dots +f^{(n-1)}(a)\frac{t^{n-1}}{(n-1)!}+f^{(n)}(\xi)\frac{t^n}{n!}

where \xi\in (a,a+t). Therefore

\displaystyle \frac{f'(a)}{1!}t+\dots +\frac{f^{(n-1)}(a)}{(n-1)!}t^{n-1} = f(a+t)-f(a)-f^{(n)}(\xi)\frac{t^n}{n!}\qquad (3)

We choose a large enough a and vary t in the interval [0,1]. So, the RHS of (3) can be made as small as we want as a\to\infty and we want to show all f^{(k)}(a), k=1,2,\dots,n-1 are also small.
Let us choose some \varepsilon>0 Taking into account (3) and the given conditions, there exists large enough A>0 such that for all a>A and t\in[0,1] it holds

\displaystyle \left|\frac{f'(a)}{1!}t+ \frac{f''(a)}{2!}t^2 +\dots +\frac{f^{(n-1)}(a)}{(n-1)!}t^{n-1}\right|<\varepsilon.

Using Claim 4, we get

\displaystyle \frac{\left|f'(a)\right|}{1!} +\frac{\left|f''(a)\right|}{2!}+\dots +\frac{\left|f^{(n-1)}(a)\right|}{(n-1)!}<C\varepsilon

where C is a constant that may depend only on n. Therefore

\displaystyle \lim_{a\to\infty}f^{(k)}(a)=0, k=1,2,\dots,n-1.

References.
[1] Markov brothers’ inequality.
[2] Putnam 1999, A5.
[3] Problem 3 of Romanian NMO, grade 11.
[4] On an Extremal Property of Chebyshev Polynomials.

Progression-free sets.

Let S\subset \{1,2,\dots,n\}. We say S is progression-free if no three numbers in S form an arithmetic progression. A natural question arises: What is the maximum size of S in terms of n. Let us denote it by r(n). This blog post is an overview (with two proofs) of some classical estimates of the growth rate of r(n). We consider the constructions of Salem-Spencer and Behrend that produce lower bounds on r(n). As we will see, the growth rate of r(n) is less than linear but it’s almost linear. It is still an open problem the exact asymptotic behavior of this value.
Additionally, two Olympiad problems closely related to this question are provided. The first one is problem 5 of IMO 1983 and the second one was given at a Bulgarian TST for IMO 2012.

History.

P. Erdős and P. Turan in [1] established some properties of r(n) and made a conjecture (they rather attributed it to G. Szekeres) that r(n)\asymp n^{\log_3 2}. More specifically, G. Szekeres had conjectured r(\frac{1}{2}(3^k+1))=2^k. This is true for k=1,2,3,4 but as we will see below it fails for large k. A short proof of r(\frac{1}{2}(3^k+1))\ge 2^k was presented in the same paper of Erdos and Turan. The idea is to consider all numbers with k digits base 3 consisting only of digits 0,1 (but not 2). Interestingly, the same idea was used some 50 years later in IMO 1983, problem 5.
Another common conjecture of those years was that there exists a constant \delta such that r(n)\le n^{1-\delta}. In 1942 Salem and Spencer [2] disproved this claim by constructing an example of much larger sets that are progression-free. In 1946 Behrend [3] improved their construction and achieved a sharper lower bound of r(n). In the following years K. Roth proved the growth rate of r(n) was less than linear. More precisely, he proved r(n)\le C\frac{n}{\log \log n} for some absolute constant C.

An easy lower bound.

\displaystyle r(n)\ge Cn^{\log_2 3} for some constant C. This construction is presented in [1]. Let k\in \mathbb{N}. Consider all numbers with m digits base 3 consisting only of digits 0,1 (but not 2). It is easy to see no three of them form an arithmetic progression. Their number is 2^m and the maximum one is 1+3+\dots+3^{m-1}=\frac{1}{2}(3^m+1). Thus, we get r(n)\ge Cn^{\log_3 2}. Here is a IMO problem that is based on the same idea.

Problem 1 (IMO 1983, problem 5). Is it possible to choose 1983 distinct positive integers, all less than or equal to 10^5, no three of which are consecutive terms of an arithmetic progression?

Solution. We use the above construction for m=11. Hence, we obtain a progression-free set of 2^{11}=2024 numbers less or equal to \frac{1}{2}(3^{11}+1)=88574<10^5. \blacksquare

Salem-Spencer’s construction.

As mentioned, there was a conjecture for some period of time that r(n)\le n^{1-\delta} for some 0<\delta<1. Until the result of Salem and Spencer appeared, which showed the growth rate of r(n) was much faster than that of n^{1-\delta} and is “almost” linear. The idea is a refinement of the previous approach and also is based on considering natural numbers with digits base k which comply with certain condition.
Let k and m be natural number which values will be determined later and m is multiple of k. Consider the family X of all ordered sequences (we call them strings) of m digits base k, i.e. the digits of each string in X are 0,1,\dots,k-1. We also want each string in X to have the same number of occurrences of 0,1,\dots,k-1, that is each digit to occur exactly m/k times. It yields

\displaystyle |X|=\frac{m!}{\left[(m/k)!\right]^k}.

Further, we consider the strings in X as numbers base 2k-1. We denote the set of those numbers again as X. The key observation is that if some three numbers a,b,c\in X form an arithmetic progression (in this order) then their digits a_i,b_i,c_i at any position i=1,2,\dots,m also form an arithmetic progression. But this is impossible due to specific condition imposed on X. Indeed, consider the zero digits of c. At each of those positions there must be a zero digit in a and b. So, the zero digits occupy the same places in the three numbers. In the same way, we obtain the same fact for the positions of 1‘s, 2‘s, … , k-1‘s. Therefore, a=b=c. Hence, there is no arithmetic progression in X. Taking appropriate k and m large enough, Salem and Spencer obtained

\displaystyle r(n)\ge \frac{n}{e^{c \ln n/\ln\ln n}}

for some constant c, which has a growth rate faster than \displaystyle \frac{n}{n^\delta} for any 0<\delta<1.

Behrend’s construction.

A couple of years later Behrend achieved a sharper lower bound estimate of r(n). His construction is based on the same idea, but the constraints imposed on X are slightly different and the calculations are even simpler. We provide here the complete argument.
Let k and m be natural number which values will be determined later. Consider the family X of all ordered sequences of m digits base k. Obviously, |X|=k^m. For any x\in X let s(x) denotes the sum of the squared digits of x. Since s(x)\le m(k-1)^2, there exists a subset X'\subset X and r\in\mathbb{N}, r\le m(k-1)^2 with

\displaystyle |X'|\ge \frac{k^m}{m(k-1)^2} \text{ and } s(x)=r, \forall x\in X'

Now, consider the strings in X' as numbers base 2k-1. We claim they are free of arithmetic progressions. Indeed, suppose there exists three such numbers a,b,c\in X' that form an arithmetic progression (in this order). Then their digits a_i, b_i,c_i at any position i, 1\le i\le m also form an arithmetic progression, that is b_i=(a_i+c_i)/2,i=1,2,\dots,m. Therefore

\displaystyle b_i^2=\frac{a_i^2+b_i^2+2ai b_i}{4}\le \frac{a_i^2+b_i^2}{2}.

Note, that this inequality is strict unless a_i=b_i=c_i. Summing up for i=1,2,\dots,m and using a,b,c are distinct numbers, we get

\displaystyle \sum_{i=1}^m b_i^2<\frac{1}{2}\left(\sum_{i=1}^m a_i^2 +\sum_{i=1}^m c_i^2\right).

This contradicts the fact s(a)=s(b)=s(c). Hence, the strings in X' in base 2k-1 do not contain three term arithmetic progression. The corresponding numbers are less than (2k-1)^m and their number is at least \frac{k^m}{m(k-1)^2} . Putting k=2^m we established there exists a set of at least 2^{m^2-m}/m natural numbers less than 2^{m^2+m}. Thus, putting n\approx   2^{m^2+m} we get

\displaystyle r(n)\ge \frac{n}{e^{c\sqrt{\ln n}}}

for some absolute constant c.

Problem 2. (Bulgarian TST for IMO 2012, p6.) Prove that for some natural number n there exist n ^ 9 natural numbers less than n^{10} , so that there are no three of them that form an arithmetic progression.
Solution. We use the above construction to obtain a set of 2^{m^2-m}/m numbers, each one less than 2^{m^2+m}. Now, for large enough m we take n:= \left\lfloor 2^{(m^2+m)/10} \right\rfloor.

The official solution I have seen, gives credit to Behrend and uses the exact construction given in this paragraph.

Roth’s theorem.

The previous two examples provide lower bounds of r(n). In 1953 Klaus Roth proved

\displaystyle r(n)\le C\frac{n}{\log\log n}

for some absolute constant C. It means the growth rate of n is less than linear. For example, given any \varepsilon>0 (it may be very small), any subset of at least \varepsilon n numbers in \{1,2,\dots,n\} always contains an arithmetic progression when n is large enough. There are some improvements of the given bounds but still, there is a gap between the lower and upper bounds of r(n).

References.
[1] Erdős, Paul; Turán, Paul (1936), “On some sequences of integers” (PDF), Journal of the London Mathematical Society, 11 (4): 261–264. PDF
[2] Salem, R.; Spencer, D. C. (December 1942), “On Sets of Integers Which Contain No Three Terms in Arithmetical Progression”, Proceedings of the National Academy of Sciences, 28 (12): 561–563. PDF
[3] Behrend, F. A. (December 1946), “On sets of integers which contain no three terms in arithmetical progression”, Proceedings of the National Academy of Sciences, 32 (12): 331–332. PDF

Matching. Tutte’s Theorem. A Brazilian NMO, 2020.

This problem appeared at some Brazilian national Olympiad and is about a game on a graph. Very nice problem, but I think, problems like this should not be given at high school Olympiads. Because they depend on specific graph theory theorems which are not very popular. (Actually, there is another approach, see the thread in [4]. So, I take my words back! I was also informed, see the comment below the blog post, it was given at the undergraduate level of the Brazilian NMO,15-16 March 2021, [5])
Two players. Each player marks a vertex on his turn, so that the marked vertex must be connected with the vertex marked on the previous move by the other player. No vertex is allowed to be marked twice, the player who cannot move loses. Ok, if the given graph has a perfect matching, the second player always wins. His strategy is very simple, he just marks the vertex that matches the one marked by the previous player. Surprisingly (for me), this condition is also necessary in order the second player to have a winning strategy. That is, if there is no perfect matching, the first player always wins. It may seem easy, but it is not. Here is the full text.

Problem 1 (Brazil National Math Olympiad, 2020). In a spacecraft, there are 2n people, n\in\mathbb{N}. Any two of them are either friends or enemies(the relationship is mutual). Two aliens play the following game: Alternately, each player chooses one person, such that the chosen one (of each round) is a friend of the person chosen by the other player in the previous round. In the first round, the player can choose any person, each person can be chosen in at most once, and the player, who cannot play anymore, loses the game.
Prove that the second player has a winning strategy, if and only if, the 2n people can be split into n pairs, such that the people in each pair are friends.

Perfect matching. Some theory.

In order to solve the problem, we need some theory about perfect matching in general graphs. The main point of this section is to give a constructive characterization of any graph that does not have a perfect matching.
I think Hall’s marriage condition is very popular and commonly used in Olympiad problems, but it is applied only for bipartite graphs. There is another result called Tutte’s theorem, which gives a similar condition for a general graph. But, we will need a stronger and a more constructive result. I follow the notations and results given in the wonderful book of R. Diestel, Graph Theory [1], chapter 2.2. – Matching in general graphs. Fortunately, this chapter is included as a sample chapter by the author, so one can freely download it – see [3]. Let us begin with some notations first.

Notations.
For any graph G and a subset S of its vertices, by G-S we denote the graph that is obtained by G by removing all the vertices in S.
1) For any graph G by \mathcal{C}(G) we denote the set of its (connected) components and by q(G) the number of its odd components, that is, those of odd order.
2) A graph G=(V,E) is called factor-critical if G\neq \emptyset and for each v\in V, by removing the vertex v we get a graph with a perfect matching.
3) We call a vertex set S\subset V matchable to G-S if the bipartite graph that arises when contracting each component C\in \mathcal{C}(G-S) into a single vertex and deleting the edges inside S, contains a matching of S.

Theorem 1. (Theorem 2.2.3. in [3]) For every graph G(V,E) there is a vertex set S\subset V with the following two properties:
1) S is matchable to G-S.
2) All components in \mathcal{C}(G-S) are factor-critical.
Given any such set S, the graph G has a perfect matching if and only if |S|=|\mathcal{C}(G-S)|.

The proof of this theorem is omitted here. One can find it in the mentioned book of R. Diestel [3]. It goes by induction on the number of vertices in G. Only this theorem will be needed in order to solve Problem 1, but I present below another interesting result which generalizes Hall’s marriage theorem.

Theorem 2. (Tutte, 1947, Theorem 2.2.1. [3]) A graph G(V,E) has a perfect matching if and only if q(G-S)\le |S| for any vertex set S\subset V.

Proof of Problem 1. Let G(V,E) be the corresponding graph. We have to prove two things. 1) If G has a perfect matching then the second player has a winning strategy. 2) If G does not have perfect matching then the first player has a winning strategy.
The first claim is obvious, we discussed it at the beginning. The second player always chooses the vertex that matches the one chosen by the other one. The second claim is what makes this problem difficult. We will use theorem 1. Let S be a vertex set that is guaranteed by Theorem 1. Let S=\{v_1,v_2,\dots,v_m\} and C_1, C_2,\dots,C_m be the corresponding components in \mathcal{C}(G-S) that are matched to S, i.e. there exist vertices v'_i\in C_i,i=1,2,\dots,m such that v_iv'_i\in E,i=1,2,\dots,m. Since G does not have a perfect matching, by Theorem 1 there exists C_0\in \mathcal{C}(G-S) which is different from C_i,i=1,2,\dots,m.
The strategy of the first player (call him A and let the other one be B) is as follows. At his first move A chooses some vertex v_0\in C_0. Note that C_0-v_0 has a perfect matching. If B chooses a vertex u inside C_0, the player A responds by choosing its counterpart u' that matches u in C_0-v_0. Suppose, at some moment, B leaves C_0 and chooses v_i\in S. Then A responds by marking v'_i\in C_i. Again, since C_i-v'_i has a perfect matching, upon any vertex u that B chooses inside C_i, the player A responds by marking its counterpart u' that matches u in C_i-v'_i. If it happens B leaves C_i and jumps at vertex v_j\in S, v_j\neq v_i then A sends B into C_j by choosing v'_j. This continues like that and at some point B will not be able to leave some C_k, 1\le k\le m either because all vertices in S have already been marked, or because there is no edge back to S and no other move inside C_k is possible. Thus, the player A wins. \blacksquare

Note that, the condition there are even number of people inside the aircraft (even number of vertices) is a redundant one. That is, it can be omitted. Of course if G has a perfect matching the number of vertices is even. But in case G does not have any perfect matching, the parity of the vertices is irrelevant.

References.
[1] R. Diestel, Graph Theory, Springer-Verlag New York 1997, 2000.
[2] Reinhard Diestel’s home page.
[3] R. Diestel’s Graph Theory – Chapter 2-sample chapter.
[4] A thread in AoPS.
[5] Brazilian National Olympiad, Undergraduate level, 15-16 March, 2021

Is this Number Theory? Rioplatense MO, 2011.

A friend showed this problem to me, still unsolved in AoPS. Before reading on, you may just glance at its statement given below. We define an arithmetic function f(n) which value depends on the sum d(n) of the divisors of a given number n and also on the Euler totient function \varphi(n). We are asked if there ere infinitely many n for which f(n) is a perfect square. So, what to do? If there is one n with such a property, there will probably be an infinite number of them. One possible option is to try to prove that there is no such n. For example, by doing some modular arithmetic, or quadratic residues, or something. Especially, after trying enough values that do not bring perfect squares. By the way, the first n for which f(n) is a perfect square is n=30=2\cdot 3\cdot 5 ! I didn’t go that far in the calculations, I only found it when I almost knew how to solve it.
On the other hand, If one is convinced there are infinitely many such n‘s, which is a more reasonable conjecture, one could try to construct explicitly some kind of examples, which may lead to a pure NT chain of construction. Both approaches were not very fascinating to me, so at first, I was reluctant to think about the problem. Fortunately, the guy was persistent, he had thought and tried several things, and he was quite convinced that the approaches described did not work, and there must be another, to put it loosely: if you bring too many particles in a small room, some of them would bump into one another! In other words, he was convinced this problem had a combinatorial taste. And it indeed had, very tasteful one!
So, back on the title. Yes, I think this problem is a number theory problem, and I welcome the trend of selecting problems which are on the border of various subjects.
I hope you would enjoy this piece!

Problem (Rioplatense Math Olympiad, 2011, p6). For any n\in\mathbb{N}, let d(n) denotes the sum of its positive integers divisors and \varphi (n) – the number of positive integers up to n that are relatively prime to n. Example: d(6) = 12 and \varphi (7) = 6.
Determine whether the set of all positive integers n, such that d(n)\varphi (n) is a perfect square, is finite or infinite.

Solution. Let f(n):=d(n)\varphi(n). It easily follows that f(n_1n_2)=f(n_1)f(n_2) for any n_1,n_2\in\mathbb{N} with (n_1,n_2)=1. We enumerate the prime numbers as p_1<p_2<\dots.

Claim. For any k\in \mathbb{N} there is large enough m\in\mathbb{N}, m>k for which there exists a subset I\subset \{k,k+1,\dots,m\} that satisfies

\displaystyle f\left(\prod_{i\in I} p_i\right)

is a perfect square.

Proof. You may think of k as being fixed and m – large enough which value will be determined later. Note that f(p_i)=(p_i-1)(p_i+1). Since both p_i-1, p_i+1 are even when p_i>2 we conclude that each prime factor of f(p_i) is less than or equal to (p_i+1)/2. Let p_r be the greatest prime less than or equal to (p_m+1)/2. Thus, for any i=k,k+1,\dots,m we have

f(p_i)=p_1^{\alpha_1}p_2^{\alpha_2}\cdots p_r^{\alpha_r}.

where \alpha_1,\alpha_2,\dots,\alpha_r are non negative integers. Let us introduce the following r dimensional binary vectors

b_i:=(\alpha'_1,\alpha'_2,\dots,\alpha'_r), \alpha'_j\in\{0,1\},\alpha'_j\equiv \alpha_j\pmod{2},j=1,2,\dots,r

for any i\in J:=\{k,k+1,\dots,m\}. We will prove |J|>r for m large enough. But let us first see why it helps. So, a pinch of linear algebra comes into use now. We have |J| binary r dimensional vectors and |J|>r. It means they are linearly dependent (as vectors modulo 2, i.e. as vectors in \mathbb{F}_2^{r}.) Hence, for some non empty subset I\subset J it holds

\displaystyle \sum_{i\in I}b_i=\mathbf{0}

which means \displaystyle f\left(\prod_{i\in I} p_i\right) is a perfect square.
It remains to prove |J|>r for sufficiently large m. Clearly |J|=m-k+1=\pi(p_m)-k+1 where \pi(x) denotes the number of primes less or equal to x. On the other hand r is the number of primes less or equal to (p_m+1)/2. By prime number theorem we have \displaystyle \pi(x)\sim \frac{x}{\ln x}. Hence, \displaystyle \pi(p_m)-k+1 \sim \frac{p_m}{\ln p_m}-k+1 and \displaystyle r\sim \frac{ (p_m+1)/2 }{\ln\big(  (p_m+1)/2 \big)} which shows r<m-k+1 for sufficiently large m. \blacksquare

Applying consecutively this claim, we can find as many n as we want for which f(n) is a perfect square.

References.
https://artofproblemsolving.com/community/c6h1500705

Inequality: JBMO Shortlist 2014, A9.

The following problem is a high quality inequality. Not a standard one. I don’t like the standard ones. I mean, all those one has to juggle when applying AM-GM (weighted version), power mean, Cauchy-Schwarz, Holder, etc. It is nothing more than pattern recognition and juggling with a bunch of cheap tricks. I don’t think such inequalities should be given at IMO or any high level math competition. It is my humble opinion, but I think so. It doesn’t mean one can miss these classic inequalities. Everyone who loves mathematics should learn them. They are useful, just like the multiplication table.
Here is the statement of the problem, followed by a solution. In the last section, I outlined something about motivation.

Problem (JBMO 2014 shortlist). Let n be a positive integer and x_1, x_2,\dots, x_n; y_1, y_2, \dots, y_n be real positive numbers, such that x_1+x_2+\dots+x_n=y_1+y_2+\dots+y_n=1. Prove that

\displaystyle |x_1-y_1|+\ldots+|x_n-y_n|\leq 2-\min_{1\leq i\leq n}\frac{x_i}{y_i}-\min_{1\leq i\leq n} \frac{y_i}{x_i}

Solution. In fact, we will prove something sharper: each of the terms \displaystyle  \min_{1\leq i\leq n}\frac{x_i}{y_i} and \displaystyle  \min_{1\leq i\leq n}\frac{y_i}{x_i} can be replaced with a convex combination of those of them which are less than 1 (see (2) below). Some notations first. Let

\displaystyle I=\{i \mid 1\le i\le n, x_i\le y_i\}; J=\{i \mid 1\le i\le n, y_i<x_i\}.

Note that I cannot be empty and J can be empty only when x_i=y_i,i=1,2,\dots,n in which case the inequality is trivial. Thus, we can assume both I,J are not empty sets. Denote \displaystyle k_i:=\frac{x_i}{y_i}, \forall i\in I and \displaystyle \ell_j:=\frac{y_j}{x_j}, \forall j\in J. Let k:=\min\{k_i\mid i\in I\}, \ell := \min\{\ell_j\mid j\in J\}. Clearly \displaystyle k= \min_{1\leq i\leq n}\frac{x_i}{y_i},  \ell=\min_{1\leq i\leq n} \frac{y_i}{x_i} .

Assume i\in I. We have

\displaystyle |x_i-y_i|=y_i-x_i=(x_i+y_i)-2x_i= (x_i+y_i)-2k_iy_i.

In the same way, for any j\in J

\displaystyle |x_j-y_j|=x_j-y_j=(x_j+y_j)-2y_j= (x_j+y_j)-2\ell_j x_j.

Thus, it yields

\displaystyle \sum_{i=1}^n|x_i-y_i|= \sum_{i\in I}\big((x_i+y_i)-2k_iy_i\big)+\sum_{j\in J}\big((x_j+y_j)-2\ell_j x_j\big)=

\displaystyle = \sum_{i=1}^n(x_i+y_i) -2\sum_{i\in I}k_iy_i-2\sum_{j\in J}\ell_jx_j=

\displaystyle =2 -2\sum_{i\in I}k_iy_i-2\sum_{j\in J}\ell_jx_j.

Thus, in order to prove the original inequality, it is enough to prove

\displaystyle  2\sum_{i\in I}k_iy_i+2\sum_{j\in J}\ell_jx_j \ge k+\ell \qquad(1)

We will prove something even stronger. It holds

\displaystyle  2\sum_{i\in I}y_ik_i+2\sum_{j\in J}x_j\ell_j \ge \sum_{i\in I}y_i^*k_i +\sum_{j\in J}x_j^*\ell_j\qquad (2)

where

\displaystyle y_i^*:=\frac{y_i}{\sum_{i\in I}y_i},\forall i\in I \text{ and } x_j^*:=\frac{x_j}{\sum_{j\in J}x_j},\forall j\in J.

Let us first see why (2) is sharper than (1). By definition, \sum_{i\in I} y_i^*=1, \sum_{j\in J} x_j^*=1 . Thus, the RHS of (2) consists of two convex combinations of the points k_i,i\in I, respectively \ell_j, j\in J and apparently

\displaystyle \sum_{i\in I} y_i^*k_i\ge k\,;\, \sum_{j\in J} x_j^*\ell_j\ge \ell.

In order to prove (2) we use

\displaystyle \sum_{i\in I} y_ik_i +\sum_{j\in J}x_j=1

\displaystyle \sum_{j\in J}x_j\ell_j + \sum_{i\in I}y_i=1

Denote

\displaystyle y:= \sum_{i\in I} y_ik_i\,;\, x:=\sum_{j\in J}x_j\ell_j.

Hence, \sum_{j\in J}x_j =1-y\,;\, \sum_{i\in I} y_i=1-x. Putting it back into (2), we need to prove

\displaystyle 2(x+y)\ge \frac{x}{1-y}+\frac{y}{1-x}\qquad (3)

for any x,y satisfying 0<x,y<1, x+y\le 1. Let t=1-x-y, t\ge 0. The inequality (3) is equivalent to

\displaystyle 2(1-t)\ge \frac{x}{x+t}+\frac{y}{y+t}

\displaystyle 2t\le \frac{t}{x+t}+\frac{t}{y+t} \qquad (4)

But, since x+t=1-y\le 1 and y+t=1-x\le 1 we indeed get \frac{t}{x+t}+\frac{t}{y+t} \ge t+t=2t which proves (4) and hence (2) holds.

About motivation.

Let first drop the two correction terms \displaystyle  \min_{1\leq i\leq n} \frac{y_i}{x_i} and \displaystyle  \min_{1\leq i\leq n} \frac{x_i}{y_i} and try to prove \displaystyle \sum_{i=1}^n |x_i-y_i|\le 2. Knowing \sum_i x_i=\sum_i y_i=1 we look to represent |x_i-y_i| as (x_i+y_i)-\text{ "something"}. This brings two cases 1) x_i\le y_i and 2) y_i<x_i. That’s why the sets I,J come into consideration. So, all stuff up to the inequality (1) comes up naturally. Jumping from (1) to (2) took me some time. I tried a few things before that. Actually, I even didn’t introduce the notations k_i,\ell_j, I worked with the minimal ones, k and \ell. So, I came up with

\displaystyle  2k\sum_{i\in I}y_i+2\ell\sum_{j\in J}x_j \ge k+\ell \qquad(1')

It looks promising, because in case k=\ell it obviously holds since

\displaystyle \sum_{i\in I}y_i+\sum_{j\in J}x_j\ge  \sum_{i\in I}x_i+\sum_{j\in J}x_j=1.

But, the way it’s written, it could fail in case k is many times larger than \ell and \sum_{i\in I}y_i is small. This is quite possible. This happened because we dropped the bar (the LHS of (1') ) too much beforehand. So, instead of (1'), a finer estimate needs to be made. That’s how (1) appeared. This time, nothing like this can happen, because if \sum_{i\in I}y_i is small then the average \ell_j should be large because of

\displaystyle \sum_{j\in J}x_j\ell_j + \sum_{i\in I}y_i=1.

So, we are on the right track. Now, look again at the inequality (1). On its left hand side we see two linear combinations of k_i, respectively \ell_j while its right hand side is a sum of their minimum values. Therefore, we could try to improve (1) by replacing k, respectively \ell with some convex combination of k_i‘s, respectively \ell_j‘s which is similar to that in the LHS.
Once we get to (2), the following calculations are natural.

References.
JBMO 2014 SL thread in AoPS.

A Game on circles. Italian TST Problem.

I was sent a problem from some Italian training camp, allegedly taken place in 2017. At least, the male name used in the problem statement was Italian, I think. Probably it was not from their official team selection test, but still I put it this way in the title. For sure, it is an interesting game problem with graph theory taste. I include it here as a perfect example how pairing works in certain types of games.

Problem (Italian training, 2017). Let n\ge 3 be an integer and C_1,C_2,\dots,C_n be circumferences in the plane such that, for each i\ne j;1\le i,j\le n, the circles C_i and C_j have exactly two points of intersection and there is no point common to three distinct circles. There is a coin on each of the intersection points.
Alberto and Barbara play a game: Starting from Alberto, each one takes a coin that is not on a circumference from which a coin has been removed in the immediately preceding round. During the first round, Alberto takes any coin he wants. The one that cannot make a valid move loses.
Determine for which n Alberto has a winning strategy.

Thus, every coin belongs to exactly two circles and every two circles have two coins on their intersection points. It is pretty clear, it’s not a geometry problem. Instead of circles it could have been ovals or some other curves. In order to filter irrelevant objects like circles, let translate the problem into graph theory language.

We have a complete graph K_n with n vertices (the vertices replace the circles). There are two coins on each edge of the graph. Alberto and Barbara play the described game of taking coins, and on each move the player cannot take a coin that is on any edge incident with the edge the previous coin was taken from.

I think, this is a better way to visualize what happens. Since we have a even number of coins, perhaps the second player could always make a move. There is a routine strategy in some of these games and it’s absolutely mandatory one to try it first. The idea is to pair the coins, so that the coins of each pair do not lie on incident edges. If this is possible then the second player, Barbara, wins – she just takes the coin that is paired with that previously taken by Alberto. We prove such pairing is possible when n\ge 4. In case n=3, obviously Alberto wins, since after removing any coin, the other player cannot touch anything. Thus, the answer is, Alberto has a winning strategy only when n=3.

Assume n\ge 4. Let, as usual, E be the set of the edges of K_n. We will construct a bijection f: E\to E with the following properties. For each e\in E the edges e and f(e) are not incident and the same holds for the edges e and f^{-1}(e).
Enumerate the vertices of K_n as 1,2,\dots,n. By convention, for any k\in \{1,2,\dots,n\} and integer i we assume k+i=\ell where \ell\equiv k+i \pmod{n} and 1\le \ell\le n. For any edge (i,j) for which i and j are not consecutive (we assume 1 and n are consecutive), we define f((i,j)):=(i+1,j+1). When i and j are consecutive vertices, i.e. j=i+1 we define f((i,i+1)):= (i+2,i+3). Note that i+3\ne i, since n\ge 4. Now, the mentioned property of f clearly holds. The action of f on the elements of E partitions E into cycles e_1, e_2=f(e_1),e_3=f(f(e_1)),\dots,e_{\ell}=f^{\ell-1}(e_1), e_1=f(e_{\ell}). For each such cycle C=\{e_1,e_2,\dots,e_{\ell}\} let e_i^{1} and e_{i}^2, i=1,2,\dots,\ell be the two coins on e_i. We pair the coin e_i^1 with e_{i+1}^2 for i=1,2,\dots,n. By the mere construction of f, any pair of coins lie on edges that are not incident. As described, Barbara just takes the coin that is paired with the one taken by Alberto and wins.

Turing vs. Gödel.

No, no, I wouldn’t oppose these two great minds. I’ll comment on two of their famous results – Godel’s incompleteness theorem and Turing’s proof that the halting problem is undecidable. They are similar in the sense they are based on the similar ideas. The main point of this blog post is to show Godel’s incompleteness theorem is a short corollary from the Turing’s result on the undecidability of the halting problem. We will prove the both theorems. I promise, there is no need of any deeper understanding of logic other than a notion what a program is and a vague clue what a deduction is.

By “Turing machine” one can imagine a computer with a program that runs on it. This computer has infinite amount of storage (for example, RAM) and can print on infinite roll of paper. It takes as an input, a binary string with finite length, and processes it. While processing the input, the program may produce some output and may eventually halt or this process may run forever. For example, one may write a program in C++ that reads the first symbol of the input and outputs it (prints it) forever, or a program that reads the first symbol, prints it exactly 100 times, and then halts. Both programs would be simple ones, so upon seeing them one can judge whether the process stops or runs forever. But imagine a more complicated program. It may take a long time to study it and decide whether it halts or runs forever! Here is a natural question that arises. Does there exist an algorithm, that given any program and input, decides whether this given program halts on this input? By algorithm I mean a written program. So, could we write a software (maybe a very complicated one) that takes as input: 1) a program P (its source code) and 2) some input (binary string) S, and then decides whether P halts on the input S ?
What Turing proved was that such software could not exist! That is, the halting problem is undecidable.

Turing’s result: the halting problem is undecidable.

Suppose, for the sake of contradiction, there exists a Turing machine T_0 which takes as input pairs (T, S), where T is a Turing machine and S is a string. You may imagine the program (Turing machine) T is represented by its source code, which is a finite string over an alphabet of finitely many symbols. There is some delimiter (a special symbol, or some other convention) that separates the two strings T and S, so that T_0 knows which part of the input is T and which one is S. Further, T_0 runs some inspection on T and S and prints out whether T halts on the input S or it runs forever. We assume that for some pairs (T, S) the first string T may not be a valid Turing machine (valid program). We know how to discern between a valid program and some gibberish. In the latter case T_0 outputs that S is not a valid program.
Let us construct the following Turing machine U. Given any input string S, the machine U simulates the result of T_0 on the input (S,S), that is, U runs T_0 with the input (S,S). In case S is not a valid program, U prints it out. In case S is a valid program, T_0 will find out whether S halts on the input S or runs forever. In the former case U runs forever and in the latter one it halts. In other words, U processes a given string S by doing just the opposite of what S does on the string S. Now, we will finish in just one sentence! Think about what would happen if U takes as an input its own code U. As defined, U on U does just the opposite of what U on U does!! That’s it, we got a contradiction. Hence, there does not exists a Turing machine T_0 that decides the halting problem. \blacksquare

A very short proof. I was astounded upon seeing it for the first time, many years ago as a freshman in college. Not because there is something very involved; it resembles somehow the Cantor’s diagonal argument, and I was also aware of Godel’s results. But because of the limitations it imposes on the human knowledge. The Godel’s theorems are also in the same spirit, but they are more abstract. If there exist unprovable claims in some theory, you may add another axiom and expand the things you can prove. For example, the Euclid’s fifth postulate. Whereas, Turing bluntly proved we are limited. We cannot invent an algorithm that, upon inspection of a given program, decides whether it has a bug (and thus may enter a cycle and run forever) or it is flawless! Alan Turing proved it when he was 24!
We omitted some technical details. For example, we take it for granted what a Turing machine is, because we know what a program is and what a computer is. There were no such things in those times, though. So, Turing invented a construction (later called Turing machine) and showed how it can be described as a mathematical object. He also constructed the so called universal Turing machine U, that takes as an input any Turing machine T (or rather its encoding as a string) and an input S then simulates T over the input S. In now days we encounter this behavior everyday. For example, this is what any operating system does. It loads a program and operates it over an input. It is also what any virtual machine does.

Gödel’s incompleteness theorem.

Suppose we have in our disposal a theory that consists of some finite number of axioms that include the arithmetic properties of the natural numbers. We have deduction rules that allow us make deductions and prove theorems. Suppose this theory is consistent, i.e. we cannot prove (using the rules) both a theorem and its negation. We assume also we can check if a proof of some theorem is a valid one. The last assumption is a natural one, since the proofs must be checkable. Under these assumptions, Godel proved, there exists a claim in this theory that is unprovable and its negation is also unprovable. He constructed a sentence (claim) with a circular reference to itself, something like the one used in the liar paradox: “This claim is unprovable”. So, it is neither provable nor disprovable. In order to do it, Godel map each sentence to a natural number, the so called Godel’s numbers. That’s why, the natural numbers should be included into the theory. The classical proof of Godel is rather long and tedious, because of the technical details. Here, we present a short proof using the Turing’s result.

Suppose on the contrary, any valid claim which is not gibberish, can be either proved or disproved. We will show the halting problem is decidable, which would be contradiction. Take any Turing machine T and an input S. Consider the following two sentences, \mathrm{Th_1:}T eventually halts over the input S” and \mathrm{Th_2:}T doesn’t halt, when it takes S as an input, and runs forever”. The two sentences can be translated into valid claims of our theory about natural numbers. Indeed, we can define what T does as consecutive steps operating over some sets of objects like we have some input, which is an ordered set of symbols and we apply some instruction which depends on the internal state of the machine which leads to recursively defining a sequence of sets. We are asked whether this sequence of sets reaches some final set which we view as a halting state. So there is a proof that proves either \mathrm{Th_1} or \mathrm{Th_2} because one theorem is the negation of the other one. This proof consists of finite number of symbols, so we can enumerate lexicographically all finite strings as candidate proofs and then proceed to check them one after another. Thus, after some time, we’ll find a proof of either \mathrm{Th_1} or \mathrm{Th_2}, thus finding out where (T,S) halts or not. This description can be implemented as an algorithm, hence the halting problem is decidable, contradiction.

Fourier Transform. An Olympiad Problem.

We have already presented (in a previous blog post) how the discrete Fourier transform is useful in some Olympiad problems. Two problems were considered – a Chinese TST problem and a an IMO shortlist problem. Both of them dealt with a partition of the set of all integers into (three or four) sets with a given pattern. So, it’s no doubt why the Fourier transform is applicable to problems like these – it itself is designed to reveal patterns.
In this blog post I present to you a problem from APMO 2013 (Asian Pacific Mathematics Olympiad). Two sets are given, which comply with a pattern that brings them together. This time, we use the continuous Fourier transform (not the discrete one). I will first give the problem statement, followed by the definition of the Fourier transform together with a few basic properties. Finally, the solution is presented.

Problem (APMO 2013, Problem 4). Let a and b be positive integers, and let A and B be finite sets of integers satisfying

(i) A and B are disjoint;
(ii) if an integer i belongs to either to A or to B, then either i+a belongs to A or i-b belongs to B.

Prove that a\left| A \right| = b \left| B \right|. (Here \left| X \right| denotes the number of elements in the set X.)

Fourier Transform.

If you know the basic properties of Fourier transform jump to the solution section. Let f:\mathbb{R}\to\mathbb{C} be integrable function. We define

\displaystyle \hat{f}(\xi):= \int_{-\infty}^{\infty} f(t)e^{-2\pi i\xi t}\,dt\,,\,\xi\in\mathbb{R} \qquad (1)

Thus, the obtained function \hat{f}:\mathbb{R}\to\mathbb{C} is called Fourier transform of function f. Loosely speaking, it may be viewed as \hat{f} being a frequency decomposition of f.

The following property shows how to find the Fourier transform of an offset of f. For any a\in \mathbb{R}

\displaystyle \widehat{f(x+a)}=e^{2\pi i a \xi }\hat{f}(\xi)\qquad (2)

Imagine a river with two banks. Some good functions live on one side, and on the other side (the Fourier side)- their Fourier transforms. Each function is related with its Fourier counterpart. So, shifting a function f by an offset a on one side corresponds to multiplying its Fourier transform by e^{2\pi i a\xi} on the Fourier side.
This picture is common for many other Fourier properties. For example, multiplying any two function on one bank corresponds to taking their convolution on the Fourier side. The Fourier transform also preserves inner product. The inner product of two functions on one bank is the same as the inner product of their transforms on the Fourier side.
There is a way to recover a function knowing its Fourier transform. So called inverse Fourier transform.

\displaystyle f(x)=\int_{-\infty}^{\infty}\hat{f}(\xi)e^{2\pi i x\xi}\,d\xi.

Ok, this information is more than enough to solve this particular problem. Oh, as I see now, we need another fact. For some wide class of functions f, the corresponding Fourier transform \hat{f} is everywhere differentiable. It certainly holds when f is a bounded function with a compact support.

Back to the problem.

The given condition just means that

\displaystyle A\cup B\subset \left(A-a\right)\cup \left( B+b\right)

where, as usual, A-a=\{x-a : x\in A\}. We have |A\cup B|=|A|+|B|, because A and B are disjoint. Therefore, \left|\left(A-a\right)\cup \left( B+b\right)\right|\ge |A|+|B|. Since |A-a|=|A|, |B+b|=|B| it follows A-a and B+b are disjoint and

\displaystyle A\sqcup B= \left(A-a\right)\sqcup \left( B+b\right)\qquad (3)

Let us denote

\displaystyle A_1:=\bigcup_{x\in A} \left[x,x+1\right)\,;\, B_1:=\bigcup_{x\in B}\left[x,x+1\right).

Clearly, a similar condition as (3) holds for A_1,B_1

\displaystyle A_1\sqcup B_1= \left(A_1-a\right)\sqcup \left( B_1+b\right)\qquad (4)

Let f(x) and g(x) be the indicator functions of sets A_1, correspondingly B_1. By (4) we get

\displaystyle f(x)+g(x)=f(x+a)+g(x-b)

Taking the Fourier transforms of the left and right sides, and using (2), it yields

\hat{f}(\xi)+\hat{g}(\xi)=\hat{f}(\xi)e^{2\pi i a\xi} + \hat{g}(\xi) e^{-2\pi i b\xi}

Further, we differentiate with respect to \xi the both sides of the above equality. As mentioned in the above section, \hat{f}(\xi) and \hat{g}(\xi) are smooth.

\displaystyle \hat{f}'(\xi)+\hat{g}'(\xi)=\hat{f}'(\xi)e^{2\pi i a\xi}+2\pi i a\hat{f}(\xi)e^{2\pi i a\xi} + \hat{g}'(\xi) e^{-2\pi i b\xi}-2\pi i b \hat{g}(\xi) e^{-2\pi b \xi}

Putting \xi=0 it yields

\displaystyle a\hat{f}(0)=b\hat{g}(0)

By the mere definition of \hat{f}, \hat{g} (see (1) ) it follows \displaystyle \hat{f}(0)=\int_{-\infty}^{\infty} f(x)\, dx=m(A_1)=|A|, where m(A_1) denotes the measure (total length) of A_1. In the same way, \hat{g}(0)=|B|. Finally, a|A|=b|B|.

References.
[1] Application of Discrete Fourier Transform on Olympiad Problems.
[2] APMO 2013, problem 4 – a thread on AoPS

IMO 2020, Problem 6.

Finally, I decided to comment on this remarkable problem. It stayed for a couple of months in my drafts here. I postponed it several times – at first there was an error I found when writing the proof. After that, some details had to be elaborated. By the way, the most reliable way to verify your own proof is to write it down. Anyway, let’s proceed with the problem itself.

Problem 6 (IMO 2020). Prove that there exists a positive constant c such that the following statement is true:
Consider an integer n > 1, and a set S of n points in the plane such that the distance between any two different points in S is at least 1. It follows that there is a line \ell separating S such that the distance from any point of S to \ell is at least cn^{-1/3}.
(A line \ell separates a set of points S if some segment joining two points in S crosses \ell.)
Note. Weaker results with cn^{-1/3} replaced by cn^{-\alpha} may be awarded points depending on the value of the constant \alpha> 1/3.
Proposed by Ting-Feng Lin and Hung-Hsun Hans Yu, Taiwan.

The Idea.

We fix some \varepsilon<n^{-1/3}, which exact value will be determined later. Let us draw a circle with radius \varepsilon around each point in S and let the polygon P be the convex hull of S. We want to show there is a line \ell that intersects P and does not meet any circle. Denote by p the perimeter of the polygon P. We’ll consider two cases.
The first case is when the interior of P is “too densely” packed with circles of radius \varepsilon (thou, the distance between any two of their centers is at least 1). In this case we do not allow the line \ell to penetrate deeply into the interior of P, because there are “too many” circles inside it and it would meet someone. We rotate the line \ell and allow it to cut off only a “small piece” of the the polygon P.
The second case is when the perimeter of P is “too large”, i.e. the small circles of radius \varepsilon are not tightly packed inside P. It’s dealt in a routine manner. We project P on an appropriate line \ell' (the line on which P has the longest projection) and establish that some line \ell that is perpendicular to \ell' does not meet any circle.
I put more details about the motivation in the last section of this blog post.

Solution.

Let the polygon P be the convex hull of S. Denote by p the perimeter of P. Fix some \varepsilon<n^{-1/3}, which exact value will be determined later.

Suppose some exterior angle \alpha of the polygon P is greater than n^{-1/3}. Then we can chose a line \ell at distance \varepsilon:=n^{-1/3} from that vertex that cuts off a very small corner from the polygon – see at figure 1.

Fig. 1.

Let A'\in AB, C'\in BC, |A'B|=|C'B| and A'C' touches the circle with center B and radius \varepsilon. A straightforward calculation yields \displaystyle |A'B|=\frac{\varepsilon}{\sin{\frac{\alpha}{2}}} or |A'B|\le \pi/4 which means \ell cannot meet any other circle (for sufficiently large n), and we are done. Thus, hereafter are under the following assumption.
Assumption 1: Every exterior angle of P is at most n^{-1/3}.
Further, we consider two cases, and start with the harder one.

First case: p \le n^{2/3}. According to the assumption, every exterior angle of P is less than p^{-1/2}. For any z\in P (further by P we mean the contour of the polygon) let z'\in P be the point at distance \frac{1}{100}p^{1/2} measured clockwise, along the polygon, i.e. the broken line between z and z' has length \frac{1}{100}p^{1/2}. Let h>0 be the maximum distance such that translating the line zz' at distance h toward the direction outside the polygon, it still meets P. Thus, h depends on z\in P. We can parameterize z as being at distance x, 0\le x<p measured clockwise alongside the polygon, starting from some fixed point z_0\in P. Each such 0\le x<p corresponds to a point z\in P and h=h(x). Consider the region D\subset \mathbb{R}^2 defined as

\displaystyle D:=\{(x,t)\in \mathbb{R}^2 : 0\le x<p, 0\le t\le h(x)\}.

Each point (x,t)\in D corresponds to a line \ell(x,t) that intersects P, namely, we take zz' that corresponds to x and translate it at distance t outward P (dotted lines in the fig. 2 below).

Fig. 2.

Denote by L the family of all those lines, i.e.

\displaystyle L:=\left\{ \ell(x,t) : (x,t)\in D\right\}

We claim there is a line \ell\in L that does not meet any circle. In order to prove this, we establish three auxiliary facts.

Claim 1. The area m(D) of D is at least C_1\cdot p for some absolute constant C_1.

Claim 2. For any s\in S let B(s,\varepsilon) be the circle with center s and radius \varepsilon. Let D_s be the subset of D that corresponds to the lines intersecting B(s,\varepsilon). Then

\displaystyle m(D_s)\leq C_2\sqrt{p}\cdot \varepsilon

where C_2 is an absolute constant, and m(X) means the area of X.

Claim 3. The lines in L intersect at most C_3\cdot p circles for some absolute constant C_3.

Before proving the above claims, let us first see how they help us to prove the statement of the problem. Let S'\subset S be the set of those points s\in S for which some line in L intersects B(s,\varepsilon). We have

\displaystyle m\big(\{(x,t)\in D: \ell(x,t)\text{ intersects a circle}\}\big) \le \sum_{s\in S'} m(D_s).

Using claims 2 and 3, we get

\displaystyle \sum_{s\in S'} m(D_s)\le C_3\cdot C_2\cdot p\cdot \sqrt{p}\cdot \varepsilon.

Now, we choose \displaystyle \varepsilon:=\frac{C_1}{2C_2\cdot C_3}n^{-1/3}. Because \sqrt{p}n^{-1/3}\le 1 it yields

\displaystyle m\big(\{(x,t)\in D: \ell(x,t)\text{ intersects a circle}\}\big) \le \frac{C_1}{2}\cdot p

According to claim 1, it holds m(D)\ge C_1\cdot p, therefore there is a point in D which does not belong to the set of “bad” points \{(x,t)\in D: \ell(x,t)\text{ intersects a circle}\} and the result follows. \square

Proof of Claim 1. We can partition the contour of P into pieces, which number is bounded by an absolute constant, in such a way that each piece can be rotated appropriately so that the slope of its leftmost segment is \approx \pi/4 and the slope of its rightmost segment is \approx -\pi/4. Let y=f(x), x\in [x_1,x_2] be the equation of some such rotated piece, where f(x) is a concave function in [x_1,x_2], and f'(x_1)\approx 1, f'(x_2)\approx -1. For any x\in [x_1,x_2] by \Delta(x) we denote the length of the projection of the segment z(z)\,z'(x) on the Ox axis. Due to assumption 1, the slopes at the points z(x) and z'(x) differ no more than 1/100, hence

\displaystyle c_1\sqrt{p}\le \Delta(x)\le \frac{1}{100}\sqrt{p}

for some absolute constant c_1. In order to estimate a lower bound for m(D) we replace zz' with a smaller segment connecting the points z(x)=(x,f(x)) and z_1(x):=(x+\Delta, f(x+\Delta )) where \Delta:=c_1\sqrt{p}. Respectively h(x) is replaced by the maximal distance between (x,f(x)) and the segment zz_1 when x\in (x,x+\Delta). We denote this distance by h_1(x). Clearly h_1(x)\le h(x). We further approximate h_1(x) by the length of the vertical segment that lies on the vertical line through x+\Delta/2 and that is between the graph of f(x) and the segment zz_1 (see figure 3).

Fig. 3.

Its length is \displaystyle h_2(x)=f(x+\Delta/2)-\frac{f(x)+f(x+\Delta)}{2}. Since the slope of zz_1 is not too big, we have h_2(x)\le c_2h_1(x) for some absolute constant c_2 and it boils down to find a lower bound for

\displaystyle \int_{x_1}^{x_2} h_2(x)\,dx=-\frac{1}{2}\int_{x_1}^{x_2}(f(x+\Delta)-f(x+\Delta/2))\,dx+
\displaystyle +\frac{1}{2}\int_{x_1}^{x_2}(f(x+\Delta/2)-f(x))\,dx

Denote \varphi(x):=f(x+\Delta/2)-f(x).

\displaystyle \int_{x_1}^{x_2} h_2(x)\,dx=-\frac{1}{2}\int_{x_1+\Delta/2}^{x_2+\Delta/2}\varphi(x)\,dx+\frac{1}{2}\int_{x_1}^{x_2}\varphi(x)\,dx=

\displaystyle = -\frac{1}{2}\int_{x_2}^{x_2+\Delta/2}\varphi(x)\,dx+\frac{1}{2}\int_{x_1}^{x_1+\Delta/2}\varphi(x)\,dx\ge

\displaystyle \ge \frac{1}{2}\int_{x_2}^{x_2+\Delta/2}-f'(x_2)\frac{\Delta}{2}\,dx+ \frac{1}{2}\int_{x_1}^{x_1+\Delta/2} f'(x_1)\frac{\Delta}{2}\,dx\ge C_1\cdot p.

for an absolute constant C_1.

Proof of Claim 2. Let x_1\in [0,p) be the first point for which z(x_1)z'(x_1) touches B(s,\varepsilon). Take any point z(x_2)\in P at distance slightly more than 2\varepsilon from z'(x_1) taken alongside P. Since the slope of P is changing slowly (due to assumption 1) the segment z(x_2)z'(x_2) does not intersect B(s,\varepsilon) (see figure 4).

Fig. 4.

Therefore, each x, 0\le x<p for which z(x)z'(x) intersects B(s,\varepsilon) satisfy x\in[x_1,x_2). Note that the distance between z(x_1) and z(x_2) alongside the contour of P is less than \frac{2}{100}p, thus x_2-x_1\le \frac{2}{100}p. For each x\in [x_1,x_2] the translations of the segment z(x)z'(x) that still intersect the circle B(s,\varepsilon) lie in some interval with length 2\varepsilon, thus the result follows.

Proof of Claim 3. It boils down to prove m(D)\le C\cdot p for some absolute constant C. The proof of it goes in the same lines as that of Claim1.

Second case: p > n^{2/3} (the circles are not “tightly” packed inside P). Pick a line \ell' such that the projection P' of P on \ell' is the longest possible one. Then |P'|\ge C\cdot p for some absolute constant C (in fact C=1/\pi). For each s\in S construct a circle B(s,\varepsilon) with center s and radius \varepsilon:= \frac{C}{4}n^{-1/3}. The projection of B(s,\varepsilon) on the segment P' is a segment with length at most 2\varepsilon. The sum of those projections is at most \displaystyle 2n\varepsilon=\frac{C}{2}n^{2/3}<\frac{C}{2}p<|P'|. Therefore, there is a point X\in P' such that the line through X and perpendicular to \ell' does not meet any circle B(s,\varepsilon). \blacksquare

Something about motivation.

It’s a natural idea to consider the convex hull P of the given set of points S. So, we are looking for a line that intersects P but does not intersect any of the circles with centers at the points in S and radii \varepsilon, where \varepsilon is not too much lesser than n^{-1/3}. One can try the idea described in the second case of the above argument. It’s a routine idea. So, it works, but only in case the perimeter p of P is greater than n^{2/3} (or C\cdot n^{2/3} for some (small) absolute constant C). The other case, p<n^{2/3}, is more interesting. I tried first a configuration where P is close to a circle. I tried to consider lines that do not penetrate too deep into P.
We take a line \ell that is tangent to the circle P and begin to rotate it keeping it tangent, and at each rotation we translate it at a distance at most 1 toward the center of P. This is our family L of lines. Each line in L can be described with two parameters: the angle of the rotation \varphi, 0\le \varphi<2\pi and the distance t, 0\le t<1 the line is translated towards the center of P. So, the set \{(\varphi, t): \ell(\varphi,t)\in L\} is exactly the rectangle D:=[0,2\pi]\times [0,1] and has area exactly 2\pi. For a fixed small circle B(s,\varepsilon) it can be easily calculated the area of that part of D that corresponds to the “bad” lines intersecting B(s,\varepsilon) (figure 4 may be helpful, just imagine P were a circle). It’s something like \displaystyle \frac{2\varepsilon}{\sqrt{r}} where r is the radius of P. We have \displaystyle \varepsilon <\frac{1}{n^{1/3}}<\frac{1}{\sqrt{p}}=\frac{1}{\sqrt{2\pi r}}. So, the portion of the bad lines for each small circle B(s,\varepsilon) is less than some constant multiplied by \displaystyle \frac{1}{r}. But there are less than \approx 2\pi r points of S in the annulus with width 1 and outer radius r. Hence, if we sum up the areas corresponding to the bad lines for each of these points, and take \varepsilon :=cn^{-1/3} for some small constant c, we get the total bad area is less than 2\pi, therefore some line \ell(\varphi,t), (\varphi,t)\in D does not meet any small circle.
I did these calculations assuming P was a circle and it looked promising. But there are many obstacles if one follows exactly the same approach in general. It seems a translation with a fixed step 1 at each angle of rotation cannot be generalized seamlessly.

Rate of Convergence. A Brazilian Undergrad Competition Problem.

A problem about the rate of convergence of a recursive sequence. Imagine we have a recursive sequence x_{n+1}=x_n(1-x_n) where x_0\in(0,1). Clearly x_n is monotonic decreasing and converges to 0. The point is: how fast?
I include it here because of the curious idea. Not unusual one, but still interesting. Look at the factor 1-x_n. Each time we multiply the previous term by this factor to get the next one. If it were a constant we would have dealt with a geometric progression. But this factor is changing, it is getting closer and closer to 1, thus slowing down the way (x_n) decreases. So, the idea is simple and common. If you have a process that at each step depends on some variable that is changing, then divide this process into stages, where this variable is relatively constant, and treat it at each stage as if the variable is constant. Another point that deserves attention: instead of looking for a lower bound for x_n we take some \varepsilon (for example \varepsilon=1/n) and estimate how many terms of the sequence are larger than \varepsilon. It is an equivalent approach because (x_n) is a decreasing sequence. Here is the original problem.

Problem (Brazilian Undergrad MO 2017 P4). Let \big(a_n\big){n\geq 1} be a sequence of strictly positive terms with \displaystyle \lim_{n\to \infty}a_n=0 such that, for some c>0 and for all n\geq 1,

\displaystyle |a_{n+1}-a_{n}|\leq c\cdot a_n^2.

Prove that there exists d>0 such that \displaystyle a_n \geq \frac{d}{n},\forall n\geq 1.

Solution. Consider the sequence

\displaystyle x_{n+1}=x_n-cx_n^2, n=m,m+1,\dots\qquad (1)

where x_m:=a_m and m is chosen in such a way that a_m<\frac{1}{2c}. Since the function f(x):=x-cx^2 is monotonic increasing in the interval x\in(0,\frac{1}{2c}) and f(x)<x in this interval, it follows (by induction) x_n\le a_n, \forall n\ge m.
Thus, it boils down to prove the claim for the sequence (1). Unfortunately, I do not know an explicit way to describe this sequence. But look at the recurrence relation x_{n+1}=x_n(1-cx_n) and the term 1-cx_n. This term is approaching 1 as x_n\to 0. So, we can view upon x_n as a geometric progression, but with varying ratio which is getting closer to 1 and thus, slows down the rate of decreasing of the geometric progression. The idea is to partition the sequence into stages such that at each stage the ratio 1-cx_n isn’t changing too much, i.e. it remains approximately the same up to multiplying by some absolute constant.
So, the plan is to partition (0,1) into intervals I_j:=\left[2^{-j-1},2^{-j}\right), j=0,1,\dots and to estimate the number of terms of the sequence (x_n) in each interval. Whenever x_n\in I_j the ratio 1-cx_n\in \left[ 1-c2^{-j}, 1-c2^{-j-1}\right] . It means for any x_n\in I_j we have

\displaystyle x_{n+1}\ge \left( 1-c2^{-j}\right)x_n \qquad (2)

Let x_n be the term of the sequence that enters for the first time the interval I_j. It always hits the second half of this interval for large enough j. So, because of (2), the term x_{n+k,k=1,2,\dots} never exits I_j while it holds

\displaystyle \left( 1-c2^{-j}\right)^k\ge \frac{1}{\sqrt{2}}\,,\,k=1,2,\dots

The above inequality holds for any k\le c_12^{j}, where c_1 is a constant depending only on c. Thus, for large enough j, at least c_1\cdot 2^{j} terms of the sequence (x_n) are inside I_j. Summing up the terms over j it follows at least c_2 2^{j} terms are greater than 2^{-j} for large enough j, where c_2 is a constant (depending only on c). Since the sequence (x_n) is monotonically decreasing, the result follows.

Note that in this way we have actually proved the existence of two constants d_1, d_2 such that

\displaystyle \frac{d_1}{n}\le x_n\le \frac{d_2}{n}\,,\,\forall n\in\mathbb{N}.

Thus, the behaviour of \big(x_n \big)_{n=1}^{\infty} is fully characterized.

References.
[1] The thread in AoPS