Everything is correlated (2014–23)

This discussion revolves around the pervasive nature of correlations, the interpretation of statistical significance, and the philosophical implications of these concepts for understanding the world.

The Pervasiveness of Correlation

A central theme is the idea that in a complex system like the universe, almost everything is correlated to some degree. This is humorously illustrated with a quote from The Hitchhiker's Guide to the Galaxy:

"Since every piece of matter in the Universe is in some way affected by every other piece of matter in the Universe, it is in theory possible to extrapolate the whole of creation — every sun, every planet, their orbits, their composition and their economic and social history from, say, one small piece of fairy cake." — senko

This idea is echoed in other philosophical and spiritual traditions, such as Buddhism's concept of dependent origination and David Bohm's concept of implicate order.

"In Buddhism we have dependent origination: https://en.wikipedia.org/wiki/Prat%C4%ABtyasamutp%C4%81da" — prox

"Also the concept of implicate order, proposed by the theoretical physicist David Bohm." — lioeters

However, some users point out that not all correlations are meaningful or indicative of direct causation. The existence of spurious correlations is highlighted, and the importance of understanding the underlying mechanisms rather than just observing statistical relationships is emphasized.

"Correlation doesn't mean causation." — Anonymous commenter observing a general sentiment.

The Misinterpretation of Statistical Significance

A significant portion of the discussion focuses on the common misuse and misunderstanding of "statistical significance." Many users express frustration that statistical significance is often conflated with practical meaning or importance.

"People interpret "statistically significant" to mean "notable"/"meaningful". I detected a difference, and statistics say that it matters. That's the wrong way to think about things." — simsla

Several commenters explain that statistical significance primarily indicates the probability of obtaining observed results if the null hypothesis were true, and that it is heavily influenced by sample size.

"Significance testing only tells you the probability that the measured difference is a "good measurement". With a certain degree of confidence, you can say "the difference exists as measured"." — simsla

"The problem is basically that you can always buy a significant result with money (large enough N always leads to ”significant” result). That’s a serious issue if you see research as pursuit of truth." — bjornsing

"It's standard to set the null hypothesis to be a measure zero set (e.g. mu = 0 or mu1 = mu2). So the probability of the null hypothesis is 0 and the only question remaining is whether your measurement is good enough to detect that." — ants_everywhere

The sentiment is that large sample sizes can make even minuscule effects statistically significant, leading to potentially misleading conclusions if effect size is not also considered.

The p-value is basically: (effect size) / (noise / sqrt(n)) — mustaphah

The Role and Limitations of Statistics and Modeling

The discussion touches upon the fundamental role of statistics in discovering new knowledge from experience and experiments, especially when direct logical deduction is insufficient.

"Logic does not let you learn anything new. All logic allows you to do is restate what you already know. Fundamental knowledge comes from experience or experiments, which need to be interpreted through a statistical lens because observations are never perfect." — kqr

However, the limitations of statistical models, especially in complex and potentially "noisy" environments, are also acknowledged. The challenge of inferring causality from correlation and the potential for models to be both politically influenced and to obscure underlying biases are raised.

"The concerns about how models work are deeper than the statistical challenges of creating or interpreting them. For one thing, all the degrees of freedom we include in our model selection process allow us to construct models which do anything that we want." — nathan_compton

There's a recognition that while statistical tools are powerful, they are not arbiters of absolute truth, and a critical perspective, including an understanding of experimental design and potential confounding factors, is crucial.

"Statistical analysis is a tool, not an arbiter of Truth. We are trying to exist in the mess of a world we live in, and we're going to be using every possible advanced tool we have in our arsenal to do that, and the standard model of science is probably the best tool we have." — scoofy

The Nature of Causality and Self-Reinforcing Cycles

Several users explore the nature of causality, particularly in the context of self-reinforcing cycles where cause and effect can become intertwined or mutually reinforcing.

"For example, eat a lot and you will gain weight, gain weight and you will feel more hungry and will likely eat more." — jongjong

This complexity, where feedback loops are common, is suggested as a reason why humans might intuitively struggle with traditional, linear notions of causality.

"I'm convinced that the reason humans intuitively struggle to figure out causality is because the vast majority of causes and effects are self-reinforcing cycles and go both ways." — jongjong

The difficulty in isolating a single cause in such systems is highlighted, especially in fields where controlled experiments are challenging.

The Author and the Blog's Approach

The detailed and extensive nature of the author's work is noted, with some users expressing admiration for the ability to produce such "treatises."

"This is such a massive article. I wish I had the ability to grind out treatises like that. Looking at other content on the guy's website, he must be like a machine." — Evidlo

There's also a slight critique of the author's "rationalist" style, with one user suggesting it can include "weird political bullshit" and a mixing of statistical, social, moral, and philosophical issues in a potentially misleading way.

"This is such a bizarre sentence. The way its tossed in, not explained in any way, not supported by references, etc." — nathan_compton (referring to a sentence about algorithmic bias)

The blog's design itself also garners positive comments.

"Not commenting on the topic at hand, but my goodness, what a beautiful blog. That drop cap, the inline comments on the right hand side that appear on larger screens, the progress bar, chef's kiss. This is how a love project looks like." — ricardobayes