Data dredging

11/29/2023

Evaluating the hypotheses on the data-taking that data to support or confirm or.Examining data to “generate” hypotheses-bring them to notice and….Good Data Dredging methods are available and improving and can and have Error probabilities for causal inference presuppose causal relations that haveĩ. Statistical experience is with bad Data Dredging.Ĩ. Many experiments require adjustments for common causes.ħ. There are sometimes more possible causes of a phenomenon than can beĦ. The number of possible causal relations grows superexponentially with theĥ. Lots of modern science considers large numbers of variables.Ĥ. The best hypotheses are often not conceived of.ģ. Historically, Data Dredging has been key to fundamental scientific discoveries.Ģ. Good data dredging should accompany every empirical study in theġ.Good data dredging is serious severe testing.

Finite sample error probabilities are illusory.There is good data dredging and bad data dredging.Kepler, Darwin, Cannizarro, Mendeleev had no such numbers, but they severely tested their claims by combining data dredging with severe testing. Good when you can get it, but there are many circumstances where you have evidence but there is no number or interval to express it other than phony numbers with no logical connection with truth guidance. Statistics wants a number, or at least an interval, to express a normative virtue, the value of data as evidence for a hypothesis, how well the data pushes us toward the true or away from the false. I argue that various traditional and proposed methodological norms, including pre-specification of experimental outcomes and error probabilities for regression estimates of causal effects, are unnecessary or illusory in application. Commonly used regression methods, I will argue, are bad data dredging methods that do not severely, or appropriately, test their results. These and other examples raise a number of issues about using multiple hypothesis tests in strategies for severe testing, notably, the interpretation of standard errors and confidence levels as error probabilities when the structures assumed in parameter estimation are uncertain. The possible failures of traditional search methods for causal relations, multiple regression for example, are easily demonstrated by simulation in cases where even the earliest consistent graphical model search algorithms succeed. My claim is that in many investigations, principled computerized search is invaluable for reliable, generalizable, informative, scientific inquiry. The main thesis of my talk is that, in the spirit and letter of Mayo's and Spanos’ notion of severe testing, modern computational algorithms that search data for causal relations severely test their resulting models in the process of "constructing" them. Notwithstanding, "data dredging" is routinely practiced in the human sciences using "traditional" methods-various forms of regression for example. Nowadays, "data dredging"-using data to bring hypotheses into consideration and regarding that same data as evidence bearing on their truth or falsity-is widely denounced by both philosophical and statistical methodologists. ABSTRACT: "Data dredging"-searching non experimental data for causal and other relationships and taking that same data to be evidence for those relationships-was historically common in the natural sciences-the works of Kepler, Cannizzaro and Mendeleev are examples.

0 Comments

BLOG

Data dredging

Leave a Reply.

Author

Archives

Categories