There’s the implicit expectation in AI and data science that...

There’s the implicit expectation in AI and data science that if we could just “Have All The Data” then we could build our model, no problem. The best way to understand why this isn’t always sufficient is to consider the past as a series of indifferently well designed experiments, not just an impartial collection of data. Even if you have full access to all the data “collected” by history (from deep data sets to somebody’s game logs), this data was the result of conditions influenced by human decisions that weren’t always, or rather rarely were, diverse enough to count as a good experimental setup.

www.joshbeckman.org/notes/263624674