Even experienced Data Scientists find it challenging to address this paradox while building multi-agent reinforcement learning models for autonomous systems.Īnother practitioner mistake is the use of misleading (or distorted) graphsthat lead to flawed data representations and inaccurate conclusions. While this paradox has traditionally influenced areas like electric power grids and traffic management, it also affects an emerging area of Data Science called Reinforcement Learning. These cases reflect a phenomenon from Game Theory called the Braess' Paradox. These are surprising observations because the addition of a new road is expected to de-congest the traffic of the surrounding area, and the closure of an existing road is expected to increase the congestion on the other roads. The same phenomenon was witnessed when a road was closed in Seoul, South Korea, around 2005. In 1990, the closure of a street in New York City led to a reduction in traffic congestion in that area. In 1969, the addition of a new road in Stuttgart, Germany had an adverse effect on the traffic situation of that area, and things were normalized only after the new road was closed. I have observed that even experienced Data Scientists sometimes find this paradox challenging to grasp while addressing real business cases (even though they may have theoretically understood the phenomenon.) ![]() Hence, effective validation mechanisms must be enforced to rule out the presence of this paradox. An important implication of this paradox is that causal inferences from observational data can potentially lead to flawed analysis and insight generation. The above study is a classic case of a phenomenon called the Simpson's Paradox where the trends in different groups of data are reversed after the data is aggregated. However, upon segregating the data into different age-groups, the results revealed that smokers in all-but-one categories had higher mortality rates. ![]() This is obviously counter-intuitive and extremely surprising. Advertisers often use this fallacy to their advantage by cherry-picking data clusters to suit their arguments, or by establishing patterns to fit existing perceptions.Ī 1996 study on the effects of smoking on women revealed higher mortality rates for non-smokers. As the field of Data Science receives greater scrutiny over time (which, I think, is essential for both technological and process maturity), practitioners are bound to exercise more caution to prevent the adverse influence of this fallacy in their Data Science projects. There has been a lot of research and innovation, particularly in this decade, to address the effects of this fallacy. The rationality for such action is generally attributed to intellectual-fraud, behavioural biases and, at times, honest errors.īoth new and experienced Data Scientists are susceptible to the two aspects of the Texas Sharpshooter Fallacy, and must take adequate measures to guard against the same. The second version of the fallacy (also known as P-hacking) pertains to the act of conducting multiple tests to prove or disprove certain hypotheses, but reporting the results of only those tests with favourable or low p-values, while largely ignoring the results of the others. The rationality for such action is generally attributed to the absence of adequate data (for analysis), behavioural biases, over-reliance on past results & experiences, and intellectual-laziness. The first version of the fallacy (also related to the Clustering Illusion) is about establishing specific patterns after weak or even negligible data analysis, and then 'processing or transforming' the available data and 'structuring' new theories to force-fit them into those patterns. This fallacy is one of the most widely prevalent Data Science practitioner mistakes. If your first impression is that this scenario has nothing to do with the Data Science practice, think again. Post that, he erases all the other bull's-eyes that he had painted, and proudly displays the one with the bullet as proof of his sharpshooting skills. It is obvious that he would hit one of these various targets. In another version of this fallacy, the shooter first paints multiple bull's-eyes at the side of a barn, and then fires a bullet at them. This scenario is known as the Texas Sharpshooting Fallacy. He then showcases his 'marksmanship' to the world, and gets widely appreciated for his skills. Imagine a below-average shooter (the original story has a Texan) randomly firing at the side of a barn, and then painting bull's-eyes around the tightest clusters of holes made by his gunshots. "If you torture the data long enough, it will confess." Ronald Coase, Nobel Prize Winner in Economics, 1991.
0 Comments
Leave a Reply. |
AuthorWrite something about yourself. No need to be fancy, just an overview. ArchivesCategories |