An Inequality About Orthogonal Functions.

I’ve recently seen the problem that follows. It looked very familiar, but I couldn’t remember why. I guess, it was given as an exercise in a neighboring country – in high school. My personal feeling is that it’s not suitable for high school. Not because it’s about an inequality with integrals. I am for the integrals to be taught in high school, of course without too much attention to make things rigorous. I think, it’s inappropriate because there is a trick whose motivation may lay beyond high school. And without a motivation, it looks like juggling. I don’t think math is juggling. Here is the problem.

Problem. Let f,g be continous real functions in [0,1] satisfying

\displaystyle \int_o^1 f(x)^2\,dx=1\,;\, \int_o^1 g(x)^2\,dx=1\,;\,\int_0^1 f(x)g(x)\,dx=0

Prove that

\displaystyle \left(\int_0^1 f(x)\,dx \right)^2+\left(\int_0^1 g(x)\,dx \right)^2\le 1\qquad(1)

A motivation.

Let me first make some analysis. The condition about continuity is too much to be required. It’s enough the given functions to be square-integrable, i.e. f,g\in L^2[0,1]. By Cauchy-Schwartz we have

\displaystyle \left(\int_0^1 f(x)\,dx\right)^2\le \int_0^1 f(x)^2\,dx\int_0^11^2\,dx=\int_0^1 f(x)^2\,dx.

Note that this inequality is sharp. In case f=1 it becomes an equality. It means for f,g\equiv 1 the LHS of (1) is 2. That is, if we drop the condition of orthogonality \int_0^1 fg\,dx=0 there is no chance to obtain (1). This condition prevents the functions being both optimal in some sense. Consider now the particular case when the supports of both functions are disjoint. That is, suppose f vanishes outside a set F\subset [0,1], g vanishes outside G\subset [0,1] and F\cap G=\emptyset. In this case obviously \int_0^1f(x)g(x)\,dx=0. Again by Cauchy-Schwartz we get

\displaystyle \left(\int_F f(x)\,dx\right)^2\le \int_F f(x)^2\,dx\int_F1\,dx=m(F)\cdot \int_F f(x)^2\,dx=m(F)

where m(F) is the measure/total length of F. Similarly

\displaystyle \left(\int_G g(x)\,dx\right)^2\le m(G)

and thus, (1) holds in this case. Now, let me explain the motivation about the trick that solves it. It involves some knowledge of Fourier analysis, which is not used in the solution that follows. Let e_0=1, e_1(x),\dots be an orthonormal basis in L^2[0,1] (for instance something like \sin 2\pi nx,\cos 2\pi nx,n=0,1,\dots). Then f(x) (resp. g(x)) can be represented as

\displaystyle f(x)=\sum_{i=0}^{\infty} a_ie_i(x), a_i\in\mathbb{R}

\displaystyle g(x)=\sum_{i=0}^{\infty} b_ie_i(x), a_i\in\mathbb{R}

with 1=\int_0^1f(x)^2\,dx=\sum_{i=0}^{\infty}a_i^2 and 1=\int_0^1g(x)^2\,dx=\sum_{i=0}^{\infty}b_i^2. Moreover, \sum_{i=0}^{\infty}a_ib_i=0. Since a_o=\int_0^1f(x)\,dx, b_o=\int_0^1g(x)\,dx we have to prove a_0^2+b_0^2\le 1. That is, we have to prove that \displaystyle \sum_{i=1}^{\infty} a_i^2 and \displaystyle \sum_{i=1}^{\infty} b_i^2 cannot be both too small because if so, then a_0^2+b_0^2 becomes too large and we couldn’t comply with \displaystyle \sum_{i=0}^{\infty}a_ib_i=0.
This is the motivation to represent f and g as f(x)=a_0+f_1(x); g(x)=b_0+g_1(x) where a_0:=\int_0^1f(x)\,dx; b_0:=\int_0^1g(x)\,dx. In this way, f_1 and g_1 are both orthogonal to the constant function.


Let us represent f and g as

\displaystyle f(x)=a_0+f_1(x)\,;\,g(x)=b_0+g_1(x)

where \displaystyle a_0:=\int_0^1f(x)\,dx; b_0:=\int_0^1g(x)\,dx. We have to prove a_0^2+b_0^2\le 1. Note that \int_0^1f_1(x)\,dx=\int_0^1f(x)\,dx-a_0=0 and \int_0^1g_1(x)\,dx=0. By the given conditions we get

\displaystyle \int_0^1 f(x)^2\,dx=\int_0^1\left( a_0^2+2a_0f_1(x)+f_1(x)^2\right)\,dx=a_0^2+\int_0^1f_1(x)^2\,dx.

\displaystyle \int_0^1 g(x)^2\,dx=b_0^2+\int_0^1g_1(x)^2\,dx\qquad(2)

The orthogonality of f and g yields

\displaystyle a_0b_0+\int_0^1f_1(x)g_1(x)\,dx=0

\displaystyle (a_0b_0)^2=\left(\int_0^1f_1(x)g_1(x)\,dx\right)^2\le \int_0^1f_1(x)^2\,dx\int_0^1g_1(x)^2\,dx\qquad(3)

Denote \displaystyle x=\int_0^1f_1(x)^2\,dx\,;\,y=\int_0^1g_1(x)^2\,dx. By (2) and (3) we obtain

\displaystyle a_0^2+x=1; b_0^2+y=1 and a_0^2b_0^2\le xy which yields

\displaystyle a_0^2b_0^2\le (1-a_o^2)(1-b_0^2)

\displaystyle a_0^2+b_0^2\le 1

which is exactly (1).

A graph with infinite chromatic number.

A friend told me this wonderful problem. It was included in a lecture Janos Pach gave in his university. This is an example of an infinite graph that has infinite chromatic number. The chromatic number of a graph G, denoted by \chi(G), (when a finite number) is the minimal number of colors needed to color its vertices such that no two equally colored are connected. Coloring like that is called a proper coloring. So, there is nothing unusual that an infinite graph cannot be proper colored with finite number of colors. For instance, take the set \mathbb{N} as the vertex set and connect every two vertices. This graph cannot be colored in finite number of colors because it has as large cliques as we want. What is unusual, with the example that follows, is that the graph constructed has no triangle. That is, we cannot use an argument like finding as large clique as we want. The famous Erdos result shows that there are graphs with as large girth and chromatic number as we want and we will give an explicit example of a graph without 3-cycle.

Suppose we are given a graph G, let it be a finite graph, and we want to prove some number of colors, say k, are not enough of proper coloring G. What to do? The easiest approach is to find a k+1 clique as a subgraph of G. If it does not exist, for example if G has no triangle, then we should think of something different. A vague idea is to eliminate somehow the colors, one after another. We assume k colors are enough and color the graph appropriately. We fix a color, say “1”, and consider the set of vertices, say V_1 colored in “1”. Then V_1 is independent set, i.e. no two vertices in V_1 are connected. This condition could give some information about the maximum possible number of vertices V_1 could have, or about the structure of V_1. Then we remove V_1 and proceed with the next color. There is a chance that the sets of vertices V_1,V_2,\dots, V_k that are removed, cannot cover all vertices of G if k is too small. This is our hope, and exactly this approach was applied in the second solution. The problem itself follows. Two solutions are explained. The first one was provided by Ilya Bogdanov – I saw it somewhere on stackexchange. The second one is mine.

Problem. Let G be an infinite graph with countably many vertices labeled as v_{i,j}, i,j\in\mathbb{Z}_{>0} . For each positive integers i,j,k the vertices v_{i,j} and v_{k,i+j} are connected. Prove that the vertices of the graph cannot be colored with finitely many colors. (the vertices of a graph can be colored in a set of colors if we can assign a color to each vertex so that no two identically colored vertices are connected.)

First solution.

Below is a picture that represents the structure of G if we arrange it in rows and columns as drawn.

Fig. 1

The vertex A=v_{i,j} is connected with all (“hollow”) vertices of i+j-th row. Assume, for the sake of contradiction, G can be properly colored in finite number of colors. Then there exist two rows (imagine they be j-th and i+j-th rows) colored with the same set of colors. Let the vertex A is colored in the color c. Then in the i+j-th row there are also vertex/vertices colored in c. But since A is connected with all of them, we get a contradiction. \blacksquare
The same idea works for estimating the number of colors needed to properly color a square that consists of the first n rows and columns. Suppose k colors are enough. The number possible subsets of k colors is \binom{k}{1}+\binom{k}{2}+\dots+\binom{k}{k}=2^k-1. So if 2^k-1\log_2 n we could get the same contradiction as above. Thus, k>\log_2 n.

Second solution.

Here is another solution, I came up with. Let G_n be the finite subgraph , induced by the set of vertices V_n:=\{v_{i,j} \mid 1\le i,j\le n\,,\, i+j\le n\}. We may think of the vertices in G_n as the lattice points with coordinates (i,j). Let’s change the coordinate system, that is, we introduce new coordinates (x,y) as follows: x=j, y=i+j. In this new coordinates V_n=\{(x,y)\mid 1\le x,y,\le n, x<y\}. Two points (x_1,y_1), (x_2,y_2) in V_n are connected if x_1=y_2 or x_2=y_1. So, we changed the coordinates in order to have more easy expression that indicates when two vertices are connected. For any X, Y\subset \mathbb{N} we denote by X\overset{<}{\times} Y the set \{(x,y)\mid x\in X,y\in Y, x<y\}.
Let now X_0:=\{1,2,\dots,n\}. With the introduced notation, V_n=X_0\overset{<}{\times} {X_0}. Denote by V_n^1 the set of vertices colored in one of the colors, say, “1”. Let X' and Y' be the sets of x-coordinates, resp. y-coordinates of V_n^1. Then X', Y'\subset X_0 and X'\cap Y'=\emptyset (because otherwise two vertices of V_n^1 would be connected). It means that at least one of the sets X',Y' has at most |X_0|/2=n/2 elements. WLOG let it be Y'. We consider X_1:=X_0\setminus Y'. This set has at least |X_0|/2 elements and the set X_1\overset{<}{\times} {X_1} is free of the color “1”.

We proceed in the same way with X_1\overset{<}{\times} {X_1} and get sets X_j\overset{<}{\times} {X_j}, j=1,2,\dots each one with at least 2^{-j}n elements and the colors 1,2,\dots,j are not present in the induced graph on X_j\overset{<}{\times} {X_j}.
Thus, the chromatic number of G_n is greater than \log_2 n.

AoPS thread of this problem.

IMO 2021, Problem 5.

In the previous blogposts we commented on some problems of this year’s International Mathematical Olympiad (see the links at the end). Here we continue with problem 5. I posted the solution that follows on AoPS forum some two months ago.

Problem 5, IMO 2021. Two squirrels, Bushy and Jumpy, have collected 2021 walnuts for the winter. Jumpy numbers the walnuts from 1 through 2021, and digs 2021 little holes in a circular pattern in the ground around their favorite tree. The next morning Jumpy notices that Bushy had placed one walnut into each hole, but had paid no attention to the numbering. Unhappy, Jumpy decides to reorder the walnuts by performing a sequence of 2021 moves. In the k-th move, Jumpy swaps the positions of the two walnuts adjacent to walnut k.
Prove that there exists a value of k such that, on the k-th move, Jumpy swaps some walnuts a and b such that a<k<b.

A typical problem where everything depends on the point of view! Forget about the trajectory of the swapped walnuts and focus on the disposition of the consecutive numbers. Suppose, at every move we mark the k-th walnut k=1,2,\dots, 2021 and then swap the neighboring ones. Focus on the process of consecutive marking that eventually exhaust all walnuts.
Assume, there is a position in which we are going to mark a walnut and only one of its two adjacent ones is already marked. In this case we are done. Indeed, the walnut, we are going to mark, has a larger number than the neighboring one that has already been marked, and a smaller number than the other one that has not yet been marked. Call this marking a good marking.
We claim, such a position will arise sooner or later, due to the fact 2021 is an odd number. Note that any marking that is not good retains the disposition of the marked walnuts, that is, after the swapping that follows, the disposition of the marked walnuts is the same. Every time we mark a walnut, either it is a good marking or after that there is a block of even number (not zero) consecutive unmarked walnuts surrounded by marked ones. Thus, sooner or later, if we avoid good marking, we would arrive at a position of two consecutive unmarked walnuts, surrounded by marked ones. So, when it comes to mark any of them it would be a good marking.

IMO 2021, Problem 2.
IMO 2021, Problem 6.
IMO 2021, Problem 5 – AoPS thread.

IMO 2021, Problem 6. A Pinch of Linear Algebra.

In this blog post we give a solution to problem 6 of IMO 2021. The method used differs from those I have seen and makes it possible to achieve an even sharper estimate. Let me first recall the problem.

IMO 2021, p6. Let m\ge 2 be an integer, A a finite set of integers (not necessarily positive) and B_1,B_2,\dots,B_m subsets of A. Suppose that, for every k=1,2,\dots,m, the sum of the elements of B_k is m^k. Prove that A contains at least \dfrac{m}{2} elements.

Solution. In fact, we’ll prove something sharper: A contains at least \displaystyle \frac{2m}{3} elements. (it could be even further improved, see end of post) Let A=\{a_1,a_2,\dots,a_n\}. We must prove n\ge 2m/3. Any subset B of A can be represented as an n dimensional vector of zeroes and ones v=(v_1,v_2,\dots,v_n) in an usual way, v_i=0 resp. v_i=1 if a_i\in B resp. a_i\notin B. For each set B_k, k=1,2,\dots,m let v_k be the n dimensional binary vector that corresponds to B_k. The idea is not unusual – for the sake of contradiction assume that n<2m/3 and prove that for some k, 2\le k\le n the vector v_k is a linear combination of the previous vectors v_i, 1\le i<k with coefficients of magnitude less than m i.e.

\displaystyle v_k=\sum_{i=1}^{k-1} \alpha_i v_i\,;\, |\alpha_i|<m,i=1,2,\dots,k-1 \qquad(1)

It implies \displaystyle m^k=\sum_{i=1}^{k-1} \alpha_i m^i; |\alpha_i|<m,i=1,2,\dots,k-1, which is contradiction.
So, it boils down to a pure linear algebra question which has a standalone value.

Derived problem. Let v_1,v_2,\dots,v_m be n dimensional binary vectors and n<\frac{2}{3}m. Then, some of them, say, vector v_k can be represented as in (1).

We will provide a constructive algorithm that finds a representation as in (1). Let A_0 be the n\times m matrix which m columns are the vectors v_1,v_2,\dots,v_m (in this order). Note that m>3n/2. We may assume n_1:=\mathrm{rank(}A_0)=n since if n_1<n we can choose n_1 linearly independent rows of A_0 and proceed with the same arguments. Let us rearrange the columns of A_0 (i.e. vectors v_i ) into a new matrix of columns v_{i_1},v_{i_2},\dots,v_{i_m} so that the first n columns (vectors) are linearly independent.

First stage. Trough series of steps that swaps columns we ensure that each v_{i_k}, k=n+1,n+2,\dots,m is a linear combination of vectors among v_{i_1},\dots,v_{i_n} with lesser indices than i_k.
Initially, we set j=1. On the j-th step we inspect the matrix A_{j-1} with columns v_{i_1},\dots,v_{i_m} (which is some permutation of the initial vectors). Suppose there exists a column v_{i_k}, n+1\le k\le m which is (uniquely) represented as

\displaystyle v_{i_k}=\sum_{r=1}^n \alpha_r v_{i_r}

and additionally for some r, 1\le r\le n with \alpha_r\neq 0 we have i_r>i_k. In this case we swap the vectors/columns v_{i_k} and v_{i_r}, thus obtaining a new matrix A_j. Note that the first n columns of A_j still remain linearly independent. Further, we proceed to the next j+1 step.

Obviously, we cannot infinitely pump columns with lesser indices into the first n columns, so this procedure ends at some moment. Denote the final matrix by A'_0 and let its columns be v_{i_1},\dots,v_{i_m}. This permutation of the initial vectors satisfies the following property, we could call it of increasing order representation. (Don’t search for it, I just invented it)
Definition. Given a system of n dimensional vectors v_{i_1},v_{i_2},\dots,v_{i_m}, where i_1,i_2,\dots,i_m is a permutation of 1,2,\dots,m and m>n we say that this permutation of vectors is of increasing order representation if the following two conditions hold: (i) The first n vectors are linearly independent. (ii) Any vector v_{i_k} where n+1\le k\le m is represented (uniquely) as v_{i_k}=\sum_{r=1}^n \alpha_r v_{i_r}, \alpha_r\in\mathbb{R} and it holds i_k>\max\{i_r \mid \alpha_r\ne 0, r=1,2,\dots,n\}.

Second stage. At this stage we begin to represent consecutively each column v_{i_k}, k=n+1,n+2,\dots m of the matrix A'_0 as a combination of the first n columns. If there is a coefficient with large magnitude in this representation, we swap the corresponding columns. We prove that at some moment all the coefficients of the linear combination will be “small”.
We start with the matrix A'_0 which columns are the vectors v_{i_1},\dots,v_{i_m} and assume i_{n+1}<i_{n+2}<\dots <i_m otherwise we rearrange appropriately these m-n vectors. We apply consecutively for j=1,2,\dots, m-n the following procedure:
Procedure at step j. Let the matrix A'_{j-1} has columns v_{i_1},\dots,v_{i_m}. We write

\displaystyle v_{i_{n+j}}=\sum_{r=1}^n \alpha_r v_{i_r}\qquad (2)

It can be done since the columns of A'_{j-1} form a permutation of vectors of increasing order representation and its first n columns are linearly independent. If all coefficients \alpha_r in (2) have absolute values less than m we stop. If there exists a coefficient \alpha_r, 1\le r\le n with |\alpha_r|\ge m we swap the n+j-th column v_{n+j} with v_{i_r} and thus obtaining a new matrix A'_j. Denote by A'_j[1..n] the square matrix formed by the first n columns of A'_j. Cramer’s rule yields \displaystyle \alpha_r=\frac{\det\left(A'_j[1..n]\right)}{\det\left(A'_{j-1}[1..n]\right)} therefore

\displaystyle \frac{\left|\det\left(A'_j[1..n]\right)\right|}{\left|\det\left(A'_{j-1}[1..n]\right)\right|}\ge m\qquad (3)

Note that the columns of the new matrix A'_j also form a permutation of increasing order representation.

We claim that the described procedure finally stops at some j, 1\le j\le m-n at an instance that satisfies

\displaystyle v_{i_{n+j}}=\sum_{r=1}^n \alpha_r v_{i_r}\,;\, |\alpha_r|\le m, r=1,2,\dots,n

and additionally i_{n+j}>\max\{i_r \mid \alpha_r\ne 0, r=1,2,\dots,n\}.
Indeed, suppose for the sake of contradiction it’s not true. It means (3) holds for j=1,2,\dots,m-n therefore

\displaystyle \frac{\left|\det\left(A'_{m-n}[1..n]\right)\right|}{\left|\det\left(A'_{0}[1..n]\right)\right|}\ge m^{m-n}

Since \displaystyle {\left|\det\left(A'_{0}[1..n]\right)\right|} is a non-zero integer it follows

\displaystyle {\left|\det\left(A'_{m-n}[1..n]\right)\right|} \ge m^{m-n}\qquad (4)

Recall that A'_{m-n}[1..n] is a n\times n matrix, each of which elements is either 0 or 1. For any such matrix A it holds

\displaystyle \left|\det(A)\right|\le n^{n/2}\qquad (5)

This fact was proven in the previous blog post with even a sharper estimate, but let us prove it briefly again. Let u_1,u_2,\dots,u_n be the columns of A viewed as vectors. Each u_i is a n dimensional binary vector, hence \|u_i\|\le \sqrt{n}. Further

\displaystyle \left|\det(A)\right|\le \prod_{i=1}^n \|u_i\| \le n^{n/2}

which proves (5). Putting (4) and (5) together, it yields

\displaystyle n^{n/2}\ge m^{m-n}>n^{m-n}

hence n>2m/3, contradiction. \blacksquare

Further (slight) improvement.

In fact the same technique (inequalities (4) and (5)) allows to further slightly improve the estimate n\ge 2m/3 and get

\displaystyle n\ge \left(\frac{2}{3}+\frac{c}{\log m} \right)m

where c>0 is some absolute constant.

Maximum possible determinant that consists of 0’s and 1’s. IMO 2021 problem 6.

A n\times n matrix A is given with elements equal to 0 or 1. What is the maximum possible value of its determinant? That is, we search for

\displaystyle \max \left\{ \left|\det(A)\right|\mid A=(a_{ij}), a_{ij}\in\{0,1\}, 1\le i,j\le n \right\}\qquad (1)

This question was posed and solved to some extend by Hadamard. Interestingly, this result (even in a weaker form) can be applied to solve IMO 2021 problem 6. I won’t solve that IMO problem in this blog post. It will be done in the next one. Here, we do a short overview of the Hadamard result.

Hadamard’s Inequality.

Suppose we are given an arbitrary n\times n real matrix A. We can interpret its n rows as n vectors v_1,v_2,\dots,v_n\in \mathbb{R}^n. If those vectors are linearly independent then \left|\det(A)\right| is the volume of the parallelepiped spanned by v_1,v_2,\dots,v_n. Particularly, it yields the following inequality

\displaystyle \left|\det(A)\right|\leq \|v_1\|\cdot \|v_2\|\cdots \|v_n\| \qquad (2)

where \|v_i\| is the lenght of v_i. Equality is achieved only in case v_i,i=1,\dots,n are orthogonal vectors. This inequality is callled after J. Hadamard. We can use this fact for a rough estimate (we’ll se why) of (1).

A Rough Estimate.

Let A be a matrix with elements equal to 0 or 1. Let v_1,v_2,\dots,v_n be its row vectors. Since the coordinats of any v_i consists of 0‘s and 1‘s we have

\displaystyle \|v_i\|\le \sqrt{1^2+1^2+\dots+1^2}=\sqrt{n}

Using (2) it yields

\displaystyle \left|\det(A)\right|\le n^{n/2}\qquad(3)

The above estimate is very rough. It cannot be achieved, beacuse for this to happen two conditions should be satisfied simultaneously – 1) each vector v_i must have length \sqrt{n} and 2) The vectors v_i must be orthogonal. These two conditions somehow exclude each other. If each v_i has length \sqrt{n} it means all its entries are 1‘s. But if so, v_i\cdot v_j=n\neq 0, that’s, they cannot be orthogonal. Nevertheless, the inequality (3) is pretty sufficient for some applications, for instance for IMO 2021, problem 6.

Further Improvement.

The main observation is that if A were a matrix with elements 1‘s and -1‘s (instead of 0‘s and 1‘s) the same stimate as (3) also holds, but this time it is sharp, because the conditions 1) and 2) can be both satisfied for some values of n. So, we have to somehow reduce the question for 0,1 determinants to the similar one, but for -1,1 determinants. It can be done as in the following way:

\begin{pmatrix}0 & 1&1\\0&0&1\\1&0&1 \end{pmatrix} \times 2 \to\begin{pmatrix}0 & 2&2\\0&0&2\\2&0&2\end{pmatrix}\to\begin{pmatrix}1&1&1&1\\0&0&2&2\\0&0&0&2\\0&2&0&2\end{pmatrix}\to\begin{pmatrix}1&1&1&1\\-1&-1&1&1\\-1&1&-1&1\\-1&1&-1&1\end{pmatrix}

We start from arbitrary n\times n matrix A of 0‘s and 1‘s. Next, we multiply A by 2, then we construct a (n+1)\times(n+1) matrix by adding an additional first column (1,0,0,\dots,0) and a row that consists only of 1‘s. Then we subtract the first row from all other rows and finish at a matrix A' with elements -1 and 1. In this way, the matrix A is mapped to a matrix A' and it holds

\det (A')=2^n\cdot \det(A).

As said before, we can apply to A' the same idea as in the previous section (estimate (3)) and get \left|\det(A')\right|\leq (n+1)^{(n+1)/2} which yields

\displaystyle \left|\det(A)\right|\le 2^{-n}(n+1)^{(n+1)/2}\qquad (4)

This inequality is sharp for those matrices A' (as constructed above) for which the rows form an orthogonal system of vectors. They are called Hadamard’s matrices. It can be achieved, for instance, when n+1 is a power of two but at least 4. Generally, the question is still open.

IMO 2021, Problem 2. Part 2.

In the previous post we considered the second problem given at this year’s IMO – a strange inequality. Another approach on the same problem is presented here- the second official solution. This method indeed requires some college knowledge, but it’s interesting to present as it shows how to generalize the claim. So, let’s remind the statement.
We have to prove the following inequality holds for any x_i\in \mathbb{R}, i=1,2,\dots,n

\displaystyle f(x_1,\dots,x_n):= \sum_{i=1}^n \sum_{j=1}^n \sqrt{|x_i+x_j|} - \sum_{i=1}^n \sum_{j=1}^n \sqrt{|x_i-x _j|}\ge 0.

I guess this blog post will be viewed mostly by high school students, that’s why I tried to make it as accessible as possible. We cannot avoid some college knowledge, so the notions needed are separated in the following section. Experienced readers can skip it.

Some preliminary facts.

Positive semidefinite matrix. Let A=\big(a_{ij}\big), 1\le i,j\le n be n\times n real matrix which is symmetric, i.e. a_{ij}\in \mathbb{R}, a_{ij}=a_{ji}, \forall i,j; 1\le i,j\le n. We say A is positive semidefinite if

\displaystyle \sum_{i=1}^n\sum_{j=1}^n a_{ij}v_i v_j\ge 0

holds for all v_i\in\mathbb{R}, i=1,2,\dots,n. Knowing the rule of multiplying matrices the above requirement can be written as

\displaystyle \mathbf{v}^T A \mathbf{v}\ge 0

for any vector \mathbf{v}\in \mathbb{R}^n, where \mathbf{v}^T means the transpose matrix of the vertical vector \mathbf{v}.

A vector space with inner product. When you read V is a vector space it means V is a set of objects where any two of them can be summed and any object v\in V can be multiplied by a real number. The rules these operations comply with are the same as when dealing with the vector space \mathbb{R}^n. If additionally an inner product \langle u,v\rangle\in \mathbb{R} is defined for any u,v\in V we say V is a vector space with an inner product. Of course, the operation \langle \cdot, \cdot\rangle should comply with some rules which are the same as in the case when V=\mathbb{R}^n with the usual inner product \displaystyle \langle u,v\rangle =\sum_{i=1}^n u_iv_i.
Example 1. As said, the simplest example of a vector space with inner product is V=\mathbb{R}^n.
Example 2. Let V be the set of all real functions f defined in some interval [a,b] which are square-integrable, that is, \displaystyle \int_a^b f(x)^2\,dx<\infty. This vector space of functions is denoted as L^2[a,b] and the inner product is defined as

\displaystyle \langle f,g\rangle =\int_a^b f(x)g(x)\,dx

You may assume these integrals are Riemann’s, thou in order L^2 to have nice properties one should use the Lebesgue’s definition. But, don’t pay too much attention on the exact definition, for now being the essence is enough.

When is a matrix positive semidefinite? The last needed thing. A symmetric matrix A=\big( a_{ij}\big),1\le i,j\le n is positive semidefinite if and only if there exists a vector space with inner product V and vectors b_i\in V, i=1,2,\dots,n such that

a_{i,j}=\langle b_i,b_j\rangle, \forall i,j, 1\le i,j\le n \qquad(1)

We’ll use only the “if” condition. Its proof is short. Let b_i\in V, i=1,\dots,n and (1) holds. Let B be the matrix which columns are the vectors b_1,b_2,\dots,b_n. Then by (1) it follows A=B^T B therefore

\displaystyle v^T Av=(v^TB^T)(Bv)=\|Bv\|^2\ge 0.

A matrix that satisfies (1) is called Gram matrix of a set of vectors b_i,i=1,\dots,n. So, every Gram matrix is positive semidefinite.

Solution (the second official one).

Consider a matrix A with entries a_{ij},1\le i,j\le n where

a_{ij}= \sqrt{|x_i+x_j|} -  \sqrt{|x_i-x_j|} .



where \mathbf{e} is the vector in \mathbb{R}^n with all coordinates equal to 1 and \mathbf{e}^T is the transpose matrix of the vertical vector \mathbf{e}. So, we must prove the right hand side of (2) is not negative. In fact, something more general holds

\mathbf{v}^TA\mathbf{v} \ge 0, \forall \mathbf{v}\in \mathbb{R}^n \qquad(3)

Note that the matrix A is symmetric. Thus, we have to prove A is positive semidefinite, which (by definition) means A is symmetric and satisfies (3). In order to prove that A is positive semidefinite, it is enough to show that there is an inner product vector space X (over \mathbb{R}) and vectors f_i\in X, i=1,2,\dots,n such that a_{ij}=\langle f_i,f_j\rangle, that is, A is the Gram matrix of some n vectors in X. Let us consider the following family of functions

\displaystyle f_a(x):=\frac{\sin ax}{x^{3/4}}

where a\in \mathbb{R} is a parameter. Clearly, f_a(x)\in L^2(0,\infty), \forall a\in\mathbb{R} since f_a is continuous in (0,\infty), \displaystyle \lim_{x\to 0}f_a(x)=0 and \displaystyle \int_1^{\infty} f_a(x)^2\,dx\le  \int_1^{\infty} x^{-3/2}\,dx<\infty.
Let us calculate

\displaystyle \langle f_a,f_b\rangle=\int_0^{\infty}f_a(x)f_b(x)\,dx

We have

\displaystyle  \langle f_a,f_b\rangle  = \int_0^{\infty}\frac{\sin ax\sin bx}{x^{3/2}}\,dx=\frac{1}{2}\int_{0}^{\infty}\frac{\cos(a-b)x- \cos(a+b)x }{x^{3/2}}\,dx \qquad(4)

We cannot represent the above expression as difference of two integrals because \displaystyle \int_0^A \frac{\cos \lambda x}{x^{3/2}}\,dx is not convergent in any neighborhood of 0 while the difference of the two cosines as in (4) is. So, we use the following trick to separate the two cosine terms

\displaystyle  \langle f_a,f_b\rangle  = \frac{1}{2}\int_0^{\infty}\frac{1-\cos(a+b)x}{x^{3/2}}\,dx- \frac{1}{2}\int_0^{\infty}\frac{1-\cos(a-b)x}{x^{3/2}}\,dx

Let \displaystyle I_{\lambda}:= \frac{1}{2}\int_0^{\infty}\frac{1-\cos\lambda x}{x^{3/2}}\,dx . By changing variable u=|\lambda|x we get

\displaystyle I_{\lambda}=\sqrt{|\lambda|}  \frac{1}{2}\int_0^{\infty}\frac{1-\cos u}{u^{3/2}}\,du = c \sqrt{|\lambda|}

where c is some positive constant. Thus, it yields

\displaystyle  \langle f_a,f_b\rangle =c\left( \sqrt{|a+b|}-\sqrt{|a-b|}\right)

Therefore \displaystyle a_{ij}=\frac{1}{c}\langle f_{x_i},f_{x_j}\rangle.
To recap. Our matrix A turns out to be the Gram matrix (multiplied by a positive constant 1/c) of the vectors f_{x_i}\in L^2(0,\infty), i=1,2,\dots,n. So, A is positive semidefinite

IMO 2021, Problem 2. A Controversial Inequality.

I’d like to discuss some of the problems given at this year’s International Mathematical Olympiad, held virtually in St. Petersburg, Russia. Let’s begin with the most controversial one – the inequality. By controversial I mean the most discussed one, with variety of opinions – from ardent supporters of this kind of problems to ardent opponents. I think, I know why, but let me first tell you that I like this inequality, it is somehow out of the box, and if this trend continues I might change my mind about whether inequalities should be given at math competitions. So, let me present some opinions from a popular math forum

A delightful problem, perhaps the best inequality in the history of IMO!


Not really. Just because is a nice problem it does not mean it is good for this competetion. And I can probably say this is the most inappropriate problem as a medium problem since it has solution which would be perfect as Undergraduate Olympiad


Further, a guy, in order to back up the argument that it’s for Undergrad Olympiads, wrote

well you are not 15-17, are you? Cause when I first went to the IMO on 2018 I had not idea what concave or even continuous function even meant…


Well, it’s not something to brag about that you went to IMO and had no idea what a continuous or concave function was, but well, I see the idea. By the way, in my country this stuff is taught in high school math program, so by no means it should be considered as undergraduate or rocket science. The only thing you need to solve this problem is that a concave function in some interval [a,b] attains its minimum value either at a or at b. And that a sum of concave functions (in some interval) is also concave. That’s it, and this stuff is taught in high school.
Along these lines, I don’t see why Karamata’s or Jensen’s inequalities, for example, are appropriate for IMO, but this one is not. On the contrary, this fact about concave/convex function is much more intuitive than some of the mentioned.
The issue apparently is that this problem is somewhat out of the box. Many people are ok to apply heavy lines of manipulating standard inequalities, but when it comes to some fresh thinking, they are frustrated.
Moreover, the mere idea is not a stroke of genius. So, you subtract the LHS from the RHS one and get a function f(x_1,x_2,\dots,x_n). You want to prove f\ge 0. The first idea that comes into mind is to fix all the variables, except one, and to vary this one variable in order to see when f gets its minimum. In this way, you may reduce the inequality to some special values of the variables. Sometimes, it is convenient to vary appropriately two of the variables and so on. In this particular case we shift all variables by some offset and calculate the value of the offset that makes f minimal. I mean, this is not an unusual idea.
To be honest, the problem is not easy. I tried it, was pissed off at one point and saw the idea in the AoPS thread ([1], see post #7). I even considered the idea of shifting simultaneously all the variables, but somehow didn’t take into account (don’t know why) that the LHS didn’t change from that transformation. Ok, enough of this talk, here is the problem.

Problem 2, IMO 2021. Show that the inequality

\displaystyle \sum_{i=1}^n \sum_{j=1}^n \sqrt{|x_i-x_j|}\leqslant \sum_{i=1}^n \sum_{j=1}^n \sqrt{|x_i+x_j|}

holds for all real numbers x_1,x_2,\dots,x_n.

Solution. Denote the left hand side by A(x_1,\dots,x_n), the right hand side by B(x_1,\dots,x_n) and consider the function


We must prove f(x_1,\dots,x_n)\ge 0. The idea is to move (translate) the whole bunch of points x_i,i=1,\dots,n and see when f gets a minimum value. So, let x'_i:=x_i+t, i=1,2,\dots,n, t\in\mathbb{R}. Note that A does not change by this transformation, that is, A(x'_1,\dots,x'_n)=A(x_1,\dots,x_n). We have

\displaystyle B(t):= B(x'_1,\dots,x'_n)=\sum_{i=1}^n \sum_{j=1}^n \sqrt{|x_i+x_j+2t|}

Consider the function \phi_{i,j}(t):=\sqrt{|x_i+x_j+2t|} as a function of t\in \mathbb{R}. It is a concave in each of the intervals \displaystyle t\in \left(-\infty, -\frac{x_i+x_j}{2}\right) and \displaystyle t\in \left(-\frac{x_i+x_j}{2},\infty\right). Taking into account that sum of concave functions (in some interval) ia also concave, we get that B(t) is concave inside the intervals determined by the points -(x_i+x_j)/2, i,j=1,2,\dots,n.
Now, any function B(t) that is concave in t\in[a,b] takes its minimum value in [a,b] either when t=a or when t=b. It means that B(t) attains its minimum value either when t=-(x_k+x_{\ell})/2 for some 1\le k,\ell\le n or when t=\pm\infty. But since \displaystyle \lim_{t\to\pm\infty}B(t)=\infty, we obtained that the minimum value of B(t) is attained when t=-(x_k+x_{\ell})/2 for some k,\ell. In case k=\ell it yields x'_k=0 and when k\ne \ell we get x'_k=-x'_{\ell}.
To recap, as we move the entire ensemble of points x_i,i=1,\dots,n along the real axis, the minimum of B is attained either when some x'_k hits 0 or when some x'_k,x'_{\ell},k\ne \ell are symmetric about 0.
In both cases, all the terms of f(x'_1,\dots,x'_n) that contain x'_k (respectively x'_k and x'_{\ell}) are canceled no matter of the other values x'_i, i\ne k (resp. x'_i, i\ne k \text{ and } i\ne \ell.) So, we remain with less variables and with a function f with the same pattern. We do the same trick again and again and conclude that f takes its minimum value when some values x_i are zeroes and the others can be partitioned into symmetric pairs about 0. In this case f=0 and the result follows.

Another solution >>

[1] IMO 2021, problem 2, AoPS thread.

Edge Cover of a Bipartite Graph. Part 2.

We continue the previous blog post which relates the size of a minimum edge cover of a bipartite graph with the size of its maximum independent set. Let me briefly recall some of the terms used.
Let G be any graph with vertex set V and edge set E. We call a subset E_1 of E an edge cover of G if for any vertex v in G there exists e\in E_1 that is incident to v i.e. e covers v.
Further we assume G has no isolated vertices, because we cannot cover an isolated vertices with any edge. A subset V_1 of V is called independent if there is no edge connecting two vertices of V_1. Denote by \rho(G) and \alpha(G) correspondingly the size of a minimum edges cover and the size of a maximum independent set. We proved that the following inequality holds

\alpha (G)\le \rho(G)\qquad (1)

This is true for any graph without isolated vertices, we didn’t use that G is bipartite. In case G is bipartite, we actually have:

\alpha(G)=\rho(G)\qquad (2)

In the previous post we proved (2) via Dilworth’s theorem, and here we apply another approach using Hall’s marriage theorem.

A proof using Hall’s marriage condition.

Let G(A,B) be the corresponding bipartite graph with A,B being the two parts of its vertices. We choose an independent set V_1 with |V_1|=\alpha(G) and V_1=A_1\cup  B_1 where A_1\subset A, B_1\subset B. If we prove that there exists an edge cover E_1 with |E_1|=|V_1| it would follow that \alpha(G)\ge \rho(G) and taking into account (1) we conclude that (2) holds. Denote for brevity A'_1:=A\setminus A_1 and B'_1:=B\setminus B_1.

Fig. 1. A_1\cup B_1 is a maximum independent set.

Claim. There is a perfect matching of A'_1 to B_1 and B'_1 to A_1 (see Fig.1).

Proof. The plan is to check that Hall’s marriage condition holds for the two bipartite graphs induced on A'_1, B_1 and B'_1,A_1 and correspondingly. Take the former one G'(A'_1, B_1). As usual, for any S\subset A'_1 we denote by N(S) the set of all vertices in B_1 that have a neighbor in S. Suppose, for sake of contradiction, |N(S)|<|S| for some S\subset A'_1. Consider then the set V'_1:=A_1\cup (B_1\setminus N(S))\cup S. That is, we take V_1=A_1\cup B_1, remove the set N(S) and add the set S. Apparently V'_1 is an independent set since there are no edges between S and B_1\setminus N(S). Moreover, |V'_1|=|V_1|+|S|-|N(S)|>|V_1| which contradicts the maximality of V_1. \blacksquare

Now, consider all the edges of the matching guaranteed by the above claim. For any vertex v in A_1 or in B_1 that is unmatched, we add an arbitrary edge that is incident to v. Thus, we obtain exactly |A_1|+|B_1| edges that cover the vertices of the graph. Therefore

\alpha (G)\ge \rho(G)

and taking into account (1) we proved (2) holds.

<< To the previous post

Edge Cover of a Bipartite Graph. Part 1.

Here is a problem that connects two characteristics of a graph – its minimum edge cover and its maximum independent set. I was told it was given at a university exam, but it could be an Olympiad problem as well, for example as worded in the Problem 1 in the next section. There is nothing beyond the standard knowledge needed to succeed in any decent high school competition. In some two or three post three different approaches will be presented, using three fundamental results, respectively – Dilworth’s theorem, Hall’s marriage theorem and Max-flow min-cut theorem.
Let G be any graph with vertex set V and edge set E. We call a subset E_1 of E an edge cover of G if for any vertex v in G there exists e\in E_1 that is incident to v i.e. e covers v.
Further we assume G has no isolated vertices, because we cannot cover an isolated vertices with any edge. A subset V_1 of V is called independent if there is no edge connecting two vertices of V_1. Denote by \rho(G) and \alpha(G) correspondingly the size of a minimum edges cover and the size of a maximum independent set. Then it holds

\alpha (G)\le \rho(G)

It can be obtained easily by mapping each vertex to an edge that covers it. The details will be provided later. This inequality may be strict, for example when G is a triangle with 3 vertices, we have \rho(G)=2 and \alpha(G)=1. Interestingly, in case G is bipartite, it actually becomes an equality. We will present several approaches for proving this claim. The shortest one (I believe) is via Dilworth’s theorem. Another approach, I came up with, uses Hall’s marriage theorem and the third one – Max-flow min-cut theorem. So, you see, it’s interesting to see three different and basics results that are applicable to the same problem.
A possible Olympiad wordning of this claim follows.

Problem 1. There are two countries with certain number of airports in each. There are air lines between some pairs of the airports belonging to different countries. Every airport is an endpoint of an air line. A company A wants to buy some of the air lines so that it operates at every airport. (We say that A operates at an airport if there is an air line that A owns with an endpoint at that airport) A company B wants to buy some airports provided there are no two of them connected by an air line.
Prove that the minimum number of air lines A can buy to achieve its goal equals the maximum number of airports B could buy.

Translated into graph theory language it reads:

Claim. For any bipartite graph G without isolated vertices we have

\alpha(G)=\rho(G)\qquad (1)

Let E_1 and V_1 be correspondingly a vertex cover and an independent set with |E_1|=\rho(G) and |V_1|=\alpha(G). For each vertex v in V_1 choose a edge e that is incident to (covers) v. This is a mapping V_1\to E_1. Suppose v_1,v_2\in V_1 are mapped to one and the same e\in E_1. It means e connects v_1 and v_2 but this is impossible since V_1 is independent set and no two vertices in V_1 are connected. Thus, we have an injection V_1\to E_1 therefore

\alpha (G)\le \rho(G)\qquad (2)

This is true for any graph without isolated vertices, we didn’t use that G is bipartite.

A proof using Dilworth theorem.

Let G(A,B) be the corresponding bipartite graph with A,B being the two parts of its vertices. We define an order on V=A\cup B. For any v_1, v_2\in V it holds v_1\preceq v_2 if v_1\in A, v_2\in B or if v_1=v_2. It is straightforward to check (V,\preceq) is a partial order. By Dilworth’s theorem the sizes of the largest antichain and the smallest chain decomposition are equal. It remains to see that any antichain in (V,\preceq) is just an independent set in G and any chain decomposition is a vertex cover in G. Indeed, in the defined poset there are no chains with more than two elements. So, any chain is either a single vertex or a pair v_1\prec v_2 which can be viewed as the edge v_1v_2. In the former case, since there are no isolated vertices, we can add to v_1 any vertex it is connected with and by doing so we obtain a edge cover of V. On the other hand any vertex cover can be transformed to a chain decomposition in (V,\preceq) thus these two notions are equivalent.

A proof using Hall’s marriage condition.

I will proceed with this approach in the next blog post, but let me give a concise outline. Let V_1 be an independent set with |V_1|=\alpha(G) and V_1=A_1\cup  B_1 where A_1\subset A, B_1\subset B. If we prove that there exists an edge cover E_1 with |E_1|=|V_1| it would follow that \alpha(G)\ge \rho(G) and taking into account (2) we conclude that (1) holds.
The key idea is that there is a matching of A\setminus A_1 to B_1 and B\setminus B_1 to A_1.

To the next part >>

A Kind of Friendship Paradox.

I saw this problem in a chapter on some combinatorial techniques that a friend of mine has recently released (see [1], 1.8, problem 5). I don’t think the statement is paradoxical, the title was chosen for other reasons. Loosely speaking, the problem states that on a social network, the celebrities cannot be more than other users – the exact statement follows. As it turned out, it was posted in MathOverflow and one of the solutions there was by Terence Tao [2]. He also made a reference to a phenomenon, observed by sociologists, called friendship paradox. It states that most people have fewer friends than their friends have on average [3]. That was the reason for the title. I want to believe that Prof. Tao was just kidding, suggesting how to make the text more gender-neutral (more comments in the last section at the end). The problem is also tightly related to another one, I have already commented on in this blog, and all this together made me create this post. First things first, here is the text.

Problem 1. In a finite, simple graph G with vertex set V, we say that a vertex v is a king if \deg(v) > \deg(w) for each neighbor w of v. Let \mathrm{Kings}(G) denote the set of all non-isolated kings of G. Show that

\displaystyle |\mathrm{Kings}\,(G)| <\frac{|V|}{2}.

Give an example where one has \displaystyle \lim_{|V|\to\infty} \frac{|\mathrm{Kings}\,(G)|}{|V|}=\frac{1}{2}

The solution is based on the following lemma, which can be considered as a standalone problem. One may see a generalization of this lemma in a form given at Bundeswettbewerb Mathematik, 2020, II round.

Lemma. A bipartite graph without isolated vertices G(A,B) is given with |B|>|A|. Then there exist connected vertices a\in A, b\in B such that \deg(a)>\deg(b).
Proof. Note that the word “connected”, which I intentionally bold, is the essential part of the statement that makes it non-trivial.

Let |B|=n,|A|=m and \big\{c_{ij}\big\}, i=1,2,\dots,m, j=1,2,\dots,n be a kind of adjacency matrix of G. That is, we enumerate the vertices from A as a_1,a_2,\dots,a_m, the vertices from B as b_1,b_2,\dots,b_n and we set c_{ij}=1 if a_i is connected with b_j and c_{ij}=0 otherwise. We define another two matrices a_{ij},b_{ij}, i=1,2,\dots,m, j=1,2,\dots,n as follows.

\displaystyle a_{ij}:=\frac{c_{ij}}{\sum_{k=1}^m c_{kj} }

\displaystyle b_{ij}:=\frac{c_{ij}}{\sum_{k=1}^n c_{ik} }

Clearly, \sum a_{ij}=n, \sum b_{ij}=m, hence \sum a_{ij}>\sum b_{ij}. It means there is a cell (i,j), with a_{ij}>b_{ij}. By the definition of a_{ij},b_{ij} it yields

\displaystyle \sum_{k=1}^m c_{kj}< \sum_{k=1}^n c_{ik}

which means \deg(a_i)>\deg(b_j). \blacksquare

Back to the Problem 1. Let B:= \mathrm{Kings}\,(G) and A:=V\setminus  \mathrm{Kings}\,(G) . Suppose for the sake of contradiction |B|>|A| Note that any two kings cannot be connected with an edge, so there are not edges between any two vertices in B. Let us consider the bipartite graph G'(A,B) obtained by removing edges that connect vertices inside A. By our Lemma, there exists a\in A,b\in B with \deg_{G'}(a)>\deg_{G'}(b). It means \deg_G(a)>\deg_G (b), which is impossible, since b is a king.

For the second part of the problem, it’s enough to consider a full bipartite graph G(A,B) with |A|=n-1, |B|=n. Then the set of kings is just A since \deg(a)=n,\forall a\in A and \deg(b)=n-1,\forall b\in B.

More comments.

The statement of the Lemma can be sharpened. It holds \displaystyle  \deg(a)\ge \frac{|B|}{|A|}\deg(b) for some connected vertices a\in A, b\in B. As mentioned, this can be further generalized ([4]).
Another curious moment is the Terence Tao’s comment on the original MathOverflow thread ([2]). He wrote:

This bound can also be viewed as quantifying a variant of the “friendship paradox“. (Based on this connection, I propose “influencer” as a more modern and gender-neutral terminology alternative to “king”.)

I strongly believe that the highly respected mathematician was just joking about this modern “gender-neutral” hysteria. But, many others take it quite seriously. Can you imagine, for example, we have to change the names of some chess pieces. The king will be called “influencer”, but how to call the queen? Maybe a “birth giving influencer” ?! Or, all these good old tales about kings and princesses that are told to every child – they also need to be changed or simply cancelled!?

[1] Illustrations from the standard toolbox.
[2] A MathOverflow thread.
[3] Friendship paradox.
[4] Bundeswettbewerb Mathematik, 2020, II round.
[5] A Korean NMO problem, AoPS thread.