Who believes the results from a regression with n=100 anyways, regardless of the standard errors?
Datacolada: highly cited QJE paper falls apart when data is analyzed properly

This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.
Who believes the results from a regression with n=100 anyways, regardless of the standard errors?
What's the point of sampling theory anyways? Just take a census of the entire population, report point estimates. No standard errors needed.

yeah, no!
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.Who believes the results from a regression with n=100 anyways, regardless of the standard errors?
What's the point of sampling theory anyways? Just take a census of the entire population, report point estimates. No standard errors needed.

These are the things I wish they taught me in grad school econometrics instead of theoretical time series
This is why people with 12 years RA experience doing "trivial" Stata work have become so favored in admissions. The flip side of course is that the RA work has also become a highly inefficient and unfair titfortat entry barrier.
The profession really needs to get together and just reform the admissions system so that talented undergrads/master's grads can all get direct entry into grad school but everybody gets 1 year of (decently paid) RA work for their first two summers.
Too bad the AEA is busy trying to censor people instead.

Also don't forget in Stata merge m:m is not doing a cross product as you would expect. It does a random m:1 merge. When confronted with this weird behavior, the Stata agent on twitter replies that it is documented in the handbook with a smiley face, as if it is legitimate so long as it is documented.
I have my complaints about Stata, but to be fair to them on this point, the docs on merge specifically say: "m:m specifies a manytomany merge **and is a bad idea.**" [emphasis added] From p. 10 here: https://www.stata.com/manuals/dmerge.pdf
They also have a separate function cross which does what you are looking for: https://www.stata.com/manuals13/dcross.pdf

It is embarrassing how people rely on offtheshelf algorithms to make strong claims and don't even bother to look under the hood. My students taking undergrad metrics have wondered about this issue when comparing R to Stata.
PhD here, when I use felm with fixed effect in R and sees singleton observation, vcov might be biased message I get very worried.
I asked this with several coauthors before but none give a crap about this. Probably because Stata have a small message showing this whereas in R you need to suppress this warning…
Btw any ideas on this message?

It is embarrassing how people rely on offtheshelf algorithms to make strong claims and don't even bother to look under the hood. My students taking undergrad metrics have wondered about this issue when comparing R to Stata.
PhD here, when I use felm with fixed effect in R and sees singleton observation, vcov might be biased message I get very worried.
I asked this with several coauthors before but none give a crap about this. Probably because Stata have a small message showing this whereas in R you need to suppress this warning…
Btw any ideas on this message?First things first, do you know what a singleton observation is and why it would matter for computing SEs? Sergio Correia has a paper on his website that explains the issue if you need somewhere to start. In practice, it often makes only a small numerical difference but it depends on the data.

from what I read it means you only have 1 obs in one group and when using fixed effect transformation you will zap it, not so for how it related to calculating SE.. never taught in metrics class. Is it related to the N when calculating vcov when this obs is deleted?
It is embarrassing how people rely on offtheshelf algorithms to make strong claims and don't even bother to look under the hood. My students taking undergrad metrics have wondered about this issue when comparing R to Stata.
PhD here, when I use felm with fixed effect in R and sees singleton observation, vcov might be biased message I get very worried.
I asked this with several coauthors before but none give a crap about this. Probably because Stata have a small message showing this whereas in R you need to suppress this warning…
Btw any ideas on this message?First things first, do you know what a singleton observation is and why it would matter for computing SEs? Sergio Correia has a paper on his website that explains the issue if you need somewhere to start. In practice, it often makes only a small numerical difference but it depends on the data.

Also don't forget in Stata merge m:m is not doing a cross product as you would expect. It does a random m:1 merge. When confronted with this weird behavior, the Stata agent on twitter replies that it is documented in the handbook with a smiley face, as if it is legitimate so long as it is documented.
I have my complaints about Stata, but to be fair to them on this point, the docs on merge specifically say: "m:m specifies a manytomany merge **and is a bad idea.**" [emphasis added] From p. 10 here: https://www.stata.com/manuals/dmerge.pdf
They also have a separate function cross which does what you are looking for: https://www.stata.com/manuals13/dcross.pdfI know that cross and joinby exist. But when you have merge 1:1 1:m m:1 behave properly, people will simply assume merge m:m behave the same way. Not many people will bother to check the manual. At least, Stata should show a warning message when people attempt to use merge m:m. Instead it gives some gibberish. I don't think anyone need a m:1 random merge, which is the behavior if merge m:m. It's likely Stata programmer coded up a version of merge command that is buggy at first, then came up with the cross and joinby commands as a patch fix.

m:m is duplicate all observations in both dataset, similar coding in R data.table will be x[y, allow.cartesian=T]
No, it does not (we would assume it does catersian product). It instead takes the first occurrence of the matched value from both datasets, match these two, and discard the rest. See that's why this command is confusing!

Geez
m:m is duplicate all observations in both dataset, similar coding in R data.table will be x[y, allow.cartesian=T]
No, it does not (we would assume it does catersian product). It instead takes the first occurrence of the matched value from both datasets, match these two, and discard the rest. See that's why this command is confusing!

from what I read it means you only have 1 obs in one group and when using fixed effect transformation you will zap it, not so for how it related to calculating SE.. never taught in metrics class. Is it related to the N when calculating vcov when this obs is deleted?
It is embarrassing how people rely on offtheshelf algorithms to make strong claims and don't even bother to look under the hood. My students taking undergrad metrics have wondered about this issue when comparing R to Stata.
PhD here, when I use felm with fixed effect in R and sees singleton observation, vcov might be biased message I get very worried.
I asked this with several coauthors before but none give a crap about this. Probably because Stata have a small message showing this whereas in R you need to suppress this warning…
Btw any ideas on this message?First things first, do you know what a singleton observation is and why it would matter for computing SEs? Sergio Correia has a paper on his website that explains the issue if you need somewhere to start. In practice, it often makes only a small numerical difference but it depends on the data.
Yes that's right. The idea is that the "zapped" obs does not contibute any information so it shouldn't be counted towards N.

This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?

Something something causal cruelty to cats involving boxes something
This displays an unfortunate misunderstanding of the terms “population” and “sample”.
A 100% complete census does not give you the statistical population. It’s still a sample.LOL
If say, the US population in 2020 is just a sample, from which "statistical population" are they drawn? Parallel universes?