I want to to test whether there is a gender gap in the likelihood of a claim. I have micro data with millions of individuals, their characteristics, and whether they filled for a claim in the last 12 months. The data is from several insurance companies.
I was thinking of running a logit regression as follow:
claim=Logit(b0+b1female +B2'X2+B3'X3)
Where X2 is a vector of individual characteristics, and X3 is a vector of firm characteristics.
My questions:
1. Do I need to control for other variables, such as time trends, macro variables, number of cars in the county etc.
2. Do I have an endogeniety bias? I was thinking that people that like to take risks might be correlated with one of the variables in X2
3. Any paper out there you would recommend reading?