The Boston housing prices dataset has an ethical problem: as
investigated in [1], the authors of this dataset engineered a
non-invertible variable "B" assuming that racial self-segregation had a
positive impact on house prices [2]. Furthermore the goal of the
research that led to the creation of this dataset was to study the
impact of air quality but it did not give adequate demonstration of the
validity of this assumption.
The scikit-learn maintainers therefore strongly discourage the use of
this dataset unless the purpose of the code is to study and educate
about ethical issues in data science and machine learning.
Boston housing price dataset removed from scikit-learn1.2 due to ethics concerns
-
-
The Boston housing data set was ostensibly compiled by (the grad students and/or assistants of) David Harrison Jr. (Harvard) and Daniel L. Rubinfeld (National Bureau of Economic Research) for analysis in the paper “Hedonic housing prices and the demand for clean air” (Journal of Environmental Economics and Management 5, 81–102 (1978), referred to hereafter as HHP). This report discusses features of air quality that may have affected the median prices in the 1970s housing market of the Boston Standard Metropolitan Statistical Area (SMSA).
-
https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8
"Harrison and Rubinfeld appear to have decided on a threshold of 63% at which to switch the regime of price decline to price increase (i.e. a so-called “ghetto threshold”)."
They sound based.
-
https://medium.com/@docintangible/racist-data-destruction-113e3eff54a8
"Harrison and Rubinfeld appear to have decided on a threshold of 63% at which to switch the regime of price decline to price increase (i.e. a so-called “ghetto threshold”)."
They sound based.tbf they were writing in the 70s.. even the n-word was normal back then