"Multicollinearity" is the technical/statistical term for re-using the same information unintentionally. There's a fairly clear explanation for the term and why it can be such a big problem in making financial decisions on this webpage. (Also note on the webpage a number of different ways to graph data.)
We ran into this problem earlier when trying to put together a cluster of indicators to describe the gay/lesbian demographics in our community. Several folks had put together different kinds of measures and indices (like Richard Florida's Gay Index), but when we looked closer, all of them were relying on the same core data set from Census information about unmarried same-sex households. Had we tried to use several "different" measures to confirm each other, we would have been guilty of multicollinearity -- we would have a result with several graphs that appear to support each other, but that's only because they're all based on the same information.
When putting together community indicators, then, we probably want to watch out for this problem. Too often when we try to build indicator clusters, or constellations, we run the risk of trying to say too much with the data if the data sets are repeated within the cluster.
This makes metadata -- data about the data, or information about how the data was collected and transmitted -- critically important. Without understanding where the numbers come from, we risk multicollinearity -- and now I've repeated the term enough times you're starting to feel comfortable with it. Try it out in conversation and let me know what happens.
Home insurance companies dropping customers
-
Because of a warming planet with more wildfires and hurricanes, it’s
growing more…
*Tags:* climate, insurance, New York Times
8 minutes ago
0 comments:
Post a Comment