I saw a few people citing it in their papers, but is it a valid model? It seems purely to help you get statistical significance.
Economics Job Market Rumors » Econometrics Job Rumors
Rare-event logistic model?
(11 posts)-
Posted 1 month ago #
-
Totally fine in theory (although I cannot vouch for the above work). Logistic-link GLMs are used heavily in medicine and toxicology to predict rare events (e.g death after low exposure to a toxic chemical). More than model form, endogeneity should be more of a concern.
Posted 1 month ago # -
It's funny that this guy is such a big deal in poli sci, when as an econometrician he would be distinctly third rate. the whole paper's just some trivial finite sample bias correction.
Posted 1 month ago # -
I saw a few people citing this paper when they have a "rare event" in their data. But to correct for "bias", shouldn't we have some idea the proportion of 1's in the population to begin with (say it's 5% in the population, by census data; but 1s only show up 2% in the sample)? The interesting thing is that many of the empirical papers that use this method don't even mention how they get the proportion of 1's in the population.
Posted 1 month ago # -
This type of correction is pretty standard in any paper in which the sample over-represents a rare event. (Economists tend to talk about "choice-based" or "endogenous stratified" sample designs.) Without the correction, the marginal effects will be way too big. I would say that this is a big deal indeed. King may be a political scientist, but his paper is nice, clear, and presents and easy-to-understand explanation of the fix (which does require knowing or estimating the proportion of 1's in the population.) Rather than helping people get statistical significance, the fix is actually more likely to make their results seem less economically significant. So people aren't using this to dress up their paper. They're using it because it's a necessary fix if you want the right marginal effects.
Posted 1 month ago # -
The papers that cite King et al are looking at something like this: suppose you have X buyers and Y sellers, and you want to see what determines X to buy from Y instead of elsewhere. So we have X * Y possible combinations (which can easily reach in to the millions), out of which 1000 was observed (the 1's) in your data. Then they say that they use -relogit- to estimate the model. What makes me wonder is how the authors would have known what the "population proportion of 1's" really is in this case; and, how come they never mentioned the parameters that they used to correct for this bias in those papers. If the parameter of interest was indeed statistically significant in old-fashioned logistic model (with proper corrections for clusters etc.), why would they resort to -relogit-?
I have no grudge against King's paper - what he meant to do was proposing a method to better (and more cheaply) sample the data. But I feel that in some applications the method was misused.
Posted 1 month ago # -
^^ Not entirely true. See pp. 90-91 of Maddala for a discussion. When using the logit model, only the intercept is affected by stratified sampling of the binary response. Beta coefficients are unchanged.
The logit is a beautiful thing.
Posted 1 month ago # -
More than model form, endogeneity should be more of a concern.
--
Truer words were never for spoken on EJMR. Some of you are blinded fancy models and understand fuck all to identification.
Posted 1 month ago # -
And some of you are blinded by zero-bias identification and care fuck all about risk, and then use ridiculous instruments anyway, producing a toxic stew of bullshit, when simple things like OLS would have been only slightly biased and clearly better under quadratic (or any other reasonable) loss.
Posted 1 month ago # -
^ TESTIFY!
Posted 1 month ago #