The author is yet to respond. Informative post.
Does QJE have comments?
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?
... Yes? In practice this emerges when asymptotics depend on a dimension of the data that is naturally fixed, like US states. But it is more fundamental than that. You observe realizations of a latent DGP and this is what sampling theory studies. Did you not encounter this in you first year coursework?
Serious answer: yes sort of. The collection of people living in the USA at any given time is a sample drawn from the underlying distribution of all possible people who could live in the USA. As always, the sample mean (for example) is an approximation of the true mean of the underlying distribution, with (our belief about) the precision and accuracy of that approximation determined by sampling theory.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?
The important piece of this is the assumption that the sample at t0 differs from t1 right?
Serious answer: yes sort of. The collection of people living in the USA at any given time is a sample drawn from the underlying distribution of all possible people who could live in the USA. As always, the sample mean (for example) is an approximation of the true mean of the underlying distribution, with (our belief about) the precision and accuracy of that approximation determined by sampling theory.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?
The important piece of this is the assumption that the sample at t0 differs from t1 right?
Serious answer: yes sort of. The collection of people living in the USA at any given time is a sample drawn from the underlying distribution of all possible people who could live in the USA. As always, the sample mean (for example) is an approximation of the true mean of the underlying distribution, with (our belief about) the precision and accuracy of that approximation determined by sampling theory.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?
No.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?... Yes? In practice this emerges when asymptotics depend on a dimension of the data that is naturally fixed, like US states. But it is more fundamental than that. You observe realizations of a latent DGP and this is what sampling theory studies. Did you not encounter this in you first year coursework?
Frequentist statistics (which is where the topic of this thread originates) does not allow for latent or stochastic DGPs. The parameters are assumed fixed (albeit unknown) quantities, and the only source for variation from sample to sample is, well, the sampling process.
I mean, there's a bit of a "this statement is not true" thing here - if most economists are making the same mistake AY is making by using the default standard errors, then AY's analysis is mostly correct.
The part of the paper where he claims that researchers use sub-optimal methods is correct. But the solution he suggests is overly complex without necessity. Instead of "Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results", the title should have been "reg y x, vce(hc3) works better than reg y x, vce(robust)" and the text could have been: "see Long and Ervin (2000)".
I mean, there's a bit of a "this statement is not true" thing here - if most economists are making the same mistake AY is making by using the default standard errors, then AY's analysis is mostly correct.The part of the paper where he claims that researchers use sub-optimal methods is correct. But the solution he suggests is overly complex without necessity. Instead of "Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results", the title should have been "reg y x, vce(hc3) works better than reg y x, vce(robust)" and the text could have been: "see Long and Ervin (2000)".
That would have been an unusual paper for QJE to publish.
I mean, there's a bit of a "this statement is not true" thing here - if most economists are making the same mistake AY is making by using the default standard errors, then AY's analysis is mostly correct.The part of the paper where he claims that researchers use sub-optimal methods is correct. But the solution he suggests is overly complex without necessity. Instead of "Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results", the title should have been "reg y x, vce(hc3) works better than reg y x, vce(robust)" and the text could have been: "see Long and Ervin (2000)".
That would have been an unusual paper for QJE to publish.
Still a larger contribution than most typical QJE papers
The part of the paper where he claims that researchers use sub-optimal methods is correct. But the solution he suggests is overly complex without necessity. Instead of "Channeling fisher: Randomization tests and the statistical insignificance of seemingly significant experimental results", the title should have been "reg y x, vce(hc3) works better than reg y x, vce(robust)" and the text could have been: "see Long and Ervin (2000)".That would have been an unusual paper for QJE to publish.
Bertrand, Mull. and Duflo QJE 2004 clustering paper is basically that.
My dude, latent here means a fixed dgp with unknown parameters. He didn’t say stochastic dgp.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?... Yes? In practice this emerges when asymptotics depend on a dimension of the data that is naturally fixed, like US states. But it is more fundamental than that. You observe realizations of a latent DGP and this is what sampling theory studies. Did you not encounter this in you first year coursework?
Frequentist statistics (which is where the topic of this thread originates) does not allow for latent or stochastic DGPs. The parameters are assumed fixed (albeit unknown) quantities, and the only source for variation from sample to sample is, well, the sampling process.
My dude, latent here means a fixed dgp with unknown parameters. He didn’t say stochastic dgp.
Frequentist statistics (which is where the topic of this thread originates) does not allow for latent or stochastic DGPs. The parameters are assumed fixed (albeit unknown) quantities, and the only source for variation from sample to sample is, well, the sampling process.
If the DGP has fixed parameters, the only source of uncertainty is the sampling process. The uncertainty of that process reduces with your sample size (you will remember that CIs are a decreasing function of n). If your population has finite size N, then having a sample of size n=N eliminates sampling uncertainty entirely.
My dude, latent here means a fixed dgp with unknown parameters. He didn’t say stochastic dgp.Frequentist statistics (which is where the topic of this thread originates) does not allow for latent or stochastic DGPs. The parameters are assumed fixed (albeit unknown) quantities, and the only source for variation from sample to sample is, well, the sampling process.
If the DGP has fixed parameters, the only source of uncertainty is the sampling process. The uncertainty of that process reduces with your sample size (you will remember that CIs are a decreasing function of n). If your population has finite size N, then having a sample of size n=N eliminates sampling uncertainty entirely.
No.
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?... Yes? In practice this emerges when asymptotics depend on a dimension of the data that is naturally fixed, like US states. But it is more fundamental than that. You observe realizations of a latent DGP and this is what sampling theory studies. Did you not encounter this in you first year coursework?
Frequentist statistics (which is where the topic of this thread originates) does not allow for latent or stochastic DGPs. The parameters are assumed fixed (albeit unknown) quantities, and the only source for variation from sample to sample is, well, the sampling process.
No.