The consistency of maximum likelihood does not require correct specification, if the model is wrong the estimators converge to the pseudoparameters for which the model is closer to the actual distribution.
You talking White's quasi-MLE result?
Show me a real-world example where the LPM coefficients and the probit marginal effects are meaningfully different. I’ve been running it both ways for 20 years and haven’t had it happen yet.
Dave Giles actually gave an example of different ME.
https://davegiles.blogspot.com/2012/06/another-gripe-about-linear-probability.html
Show me a real-world example where the LPM coefficients and the probit marginal effects are meaningfully different. I’ve been running it both ways for 20 years and haven’t had it happen yet.
As an aside, the binary dep car just implies some nonlinear RHS; tbere’s no theorem that says it’s a linear index embedded in the standard normal CDF. That is a modeling choice, nothing more.
I am pretty sure it is easy to find discrepancy as you move close to out of your sample. So you can have a horrible time in doing counterfactual analysis.
Show me a real-world example where the LPM coefficients and the probit marginal effects are meaningfully different. I’ve been running it both ways for 20 years and haven’t had it happen yet.Dave Giles actually gave an example of different ME.
https://davegiles.blogspot.com/2012/06/another-gripe-about-linear-probability.html
Should be on every applied micro reading list.
Show me a real-world example where the LPM coefficients and the probit marginal effects are meaningfully different. I’ve been running it both ways for 20 years and haven’t had it happen yet.Dave Giles actually gave an example of different ME.
https://davegiles.blogspot.com/2012/06/another-gripe-about-linear-probability.htmlShould be on every applied micro reading list.
Stopped reading when he referred to OLS as a “model.” My undergrads don’t make that mistake.
Dave Giles actually gave an example of different ME.
https://davegiles.blogspot.com/2012/06/another-gripe-about-linear-probability.htmlShould be on every applied micro reading list.
Stopped reading when he referred to OLS as a “model.” My undergrads don’t make that mistake.
He actually didn't.
Dave Giles actually gave an example of different ME.
https://davegiles.blogspot.com/2012/06/another-gripe-about-linear-probability.htmlShould be on every applied micro reading list.
Stopped reading when he referred to OLS as a “model.” My undergrads don’t make that mistake.
He actually didn't.
I guess your reading comprehension can use some work. It's true that his writing makes it a bit jumbled, but here is the opening sentence:
"So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?"
What is ambiguous about "also known in the business as"? He clearly connects the "model" with the OLS estimator. Are you yet another person with a PhD who doesn't know the difference between a model and an estimation method?
The LPM can be estimated a million different ways. OLS is the most popular. But it's an estimator.
Oh, and I did continue reading, and it actually gets worse. Notice in his simulation he doesn't tell us how x(i) is generated. If he generated it as normal, for example, OLS on the LPM would have gotte the average partial effect EXACTLY -- as discussed in Papa Wooldridge. Another very strange thing about the simulation is that the parameter changes with the samples. I literally cannot think of a serious simulation where this is true, other than weak instruments and local power analysis. And those are for different purposes. When one is trying to establish consistency, the population parameter is fixed, and one takes different sample sizes to see what happens. The parameter changing with n, when we don't know how x(i) was generated, is BS.
Finally, note he focuses on the parameter that no one uses any more: the marginal effect evaluated at the mean of x. He mentions the APE but then does not fill us in. As Papa Wooldridge discusses, it is the APE that OLS often gets close to, and that is the parameter people care about.
If you generate x(i) along with the error, and created random samples, OLS on the LPM will get pretty close to the correct APE except in cases where the zeros and ones are extreme. Giles doesn't show us otherwise. How about you take a crack at it?
There is a valid point to be made, but it gets lost in all the incorrect and misleading statements about the LPM. We ALL agree the LPM is, at best, an approximation. Viewing probit, logit, and so on as approximations, these actually might work better. So Angrist is wrong because he talks about needing distributional assumptions for probit and logit to work well. And those who say the LPM should never be used are also wrong. They are different approximations to the true model. They often both give very similar answers for the APEs. That's an empirical fact.
I guess your reading comprehension can use some work. It's true that his writing makes it a bit jumbled, but here is the opening sentence:
"So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?"
What is ambiguous about "also known in the business as"? He clearly connects the "model" with the OLS estimator. Are you yet another person with a PhD who doesn't know the difference between a model and an estimation method?
Don't get your panties in a bunch. They way I understood it was that the opening sentence, referring to LPM as "good ol' OLS" was simply making fun of reg monkeys.
I guess your reading comprehension can use some work. It's true that his writing makes it a bit jumbled, but here is the opening sentence:
"So you're still thinking of using a Linear Probability Model (LPM) - also known in the business as good old OLS - to estimate a binary dependent variable model?"
What is ambiguous about "also known in the business as"? He clearly connects the "model" with the OLS estimator. Are you yet another person with a PhD who doesn't know the difference between a model and an estimation method?Don't get your panties in a bunch. They way I understood it was that the opening sentence, referring to LPM as "good ol' OLS" was simply making fun of reg monkeys.
https://www.econjobrumors.com/topic/just-run-ols
Yeah, no. Giles is exactly the kind of econometrician who isn’t careful about the difference. And you’re being very Trump like now in trying to cover yourself.
The “panties in a bunch” line is what I see from numbskulls on sports chat boards.
Giles is exactly the kind of econometrician who isn’t careful about the difference.I doubt that. And if you unclutch your pearls and read past the tongue-in-cheek opening sentence, you'll see that he has a point.
What is "tongue-in-cheek" about the first line? It's a misstatement, period.
For those who may want to actually have an exchange about the merits of the arguments, I did a couple of my own simulations. Both use n = 1,000, one X1 is continuous, X2 is binary, the true model is probit. The fraction of ones for y is about 0.71, so it's unbalanced between 0 and 1 but not very unbalanced. I was lazy and did not compute the APEs using the true model; I simulated them instead.Those are the first two rows, so the targes are .170 and -1.06.The next two lines are OLS, the next two are probit (correctly specified), and the last two are logit. The LPM does a slight bit worse for X2, the binary variable. The difference is minuscule, though.
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
_sim_1 | 1,000 .1703783 .0019922 .1640993 .1760383
_sim_2 | 1,000 -.1060871 .0013656 -.1101066 -.1020451
-------------+---------------------------------------------------------
_sim_3 | 1,000 .1706838 .010197 .1402745 .2008945
_sim_4 | 1,000 -.1101421 .0326668 -.2199216 .0047391
-------------+---------------------------------------------------------
_sim_5 | 1,000 .1707449 .010157 .1407128 .2030353
_sim_6 | 1,000 -.1058333 .0305385 -.2148732 .0053963
-------------+---------------------------------------------------------
_sim_7 | 1,000 .1707403 .0102388 .1394787 .201867
_sim_8 | 1,000 -.1055594 .0306578 -.212868 .0046626
The LPM does worse if X1 does not have a normal distribution. Below is when X1 has a gamma distribution. Now the OLS APE estimate on average, .136, is well below .181. It is more precise, however, but that's not really a good thing because it's heading to the wrong value. The APE for X2, the binary variable, is still pretty good. As expected, logit does better in terms of bias than LPM estimated by OLS; it is almost as good as the correct probit model.
-------------+---------------------------------------------------------
_sim_1 | 1,000 .1805721 .0020317 .174343 .1870508
_sim_2 | 1,000 -.1125521 .0013799 -.116972 -.1085556
-------------+---------------------------------------------------------
_sim_3 | 1,000 .1364672 .0108293 .100426 .1749405
_sim_4 | 1,000 -.1086773 .034895 -.2078895 .0061342
-------------+---------------------------------------------------------
_sim_5 | 1,000 .1809469 .0147779 .1332158 .2267856
_sim_6 | 1,000 -.1136122 .032515 -.2020213 -.0012704
-------------+---------------------------------------------------------
_sim_7 | 1,000 .1849735 .015013 .1354962 .2303285
_sim_8 | 1,000 -.1136084 .0323011 -.2013026 -.0006991
You are putting the rabbit in the hat by making Probit the "correct model." All we know is that it's nonlinear in a way that precludes predicted values outside the unit interval. A more neutral horse race would be a more arbitrary nonlinear RHS.
Which is why I showed logit, too. It does better than the LPM in the second case. I could try something more exotic but that's not going to make the LPM look better than probit.
To answer the earlier question, those are the average partial effects. The first two rows are the target APEs.
For any who may still be interested: I used an underlying Cauchy distribution here. LPM does worse than probit and logit by a considerable margin. Logit is best. It's fun to actually generate evidence rather than having uninformed people just blab about their biases.
Variable | Obs Mean Std. Dev. Min Max
-------------+---------------------------------------------------------
APE_1 | 1,000 .1271792 .0018158 .1197657 .1336143
APE_2 | 1,000 -.082663 .0012366 -.0870246 -.078119
-------------+---------------------------------------------------------
LPM_1 | 1,000 .0823026 .0066124 .0624322 .1032305
LPM_2 | 1,000 -.0756259 .0392855 -.1866878 .0744701
-------------+---------------------------------------------------------
PRB_1 | 1,000 .1036423 .0100665 .0720769 .12999
PRB_2 | 1,000 -.0788115 .0385733 -.1877821 .0708532
-------------+---------------------------------------------------------
LGT_1 | 1,000 .1108997 .0097709 .0801822 .1382814
LGT_2 | 1,000 -.0807284 .0381572 -.1885448 .0702107
Remember we care about ME. OLS most certainly will do a good job recovering those.
LPM recovers a constant as an estimate of the ME, based on a linear approximation to a non-linear model (which obviously should have a non-constant ME). Sounds pretty useless to me.
This usually works well for estimating average treatment effects. And it can easily accommodate an IV setup where you need to instrument for treatment. Also handles fixed effects.
So it's not useless. Just depends on what the question and underlying model are.Yes, you can estimate average treatment effect for a model that is known to be wrong.
You can accomodate IVs and fixed effects for a model that is known to be wrong.
But to economists, it is more important to be able to run IVs, fixed effects, etc than to use a model that is correct. Because those get you the pubs.
No model is correct, by definition.
That Logit beat LPM when Probit is the DGP is hardly a revelation. Logit is used because its functional form closely approximates Probit.
Why not make the DGP a 2nd or 3rd order polynomial and run the race that way?
Why would you use a 2nd or 3rd order polynomial to represent a true DGP model of probability?
No model is correct, by definition.
But some models are less wrong than others. Chances are neither LPM or Probit/Logit are perfectly accurate representations of the DGP, but at least the latter are consistent with (obviously known) limiting behavior of the dependent variable.