See

https://arxiv.org/abs/2109.08229

Abstract:

"The purpose of this paper is to connect the "policy choice" problem, proposed in Kasy and Sautmann (2021), to the frontiers of the bandit literature in machine learning. We discuss how the policy choice problem can be framed in a way such that it is identical to what is called the "best arm identification" (BAI) problem. By connecting the literature, we identify that the asymptotic optimality of policy choice algorithms tackled in Kasy and Sautmann (2021) is a long-standing open question in the literature. Unfortunately, this connection highlights several major issues with the main theorem. In particular, we show that Theorem 1 in Kasy and Sautmann (2021) is false. We find that the proofs of statements (1) and (2) of Theorem 1 are incorrect, though the statements themselves may be true, though non-trivial to fix. Statement (3), and its proof, on the other hand, is false, which we show by utilizing existing theoretical results in the bandit literature. As this question is critically important, garnering much interest in the last decade within the bandit community, we provide a review of recent developments in the BAI literature. We hope this serves to highlight the relevance to economic problems and stimulate methodological and theoretical developments in the econometric community."

tl,dr: All the proofs for all the main statements are wrong. At least one statement is definitely false. The other statements over-claim and no one knows whether they can be proven or if they're just false.