In Europe, copyright laws continue to vary from country to country, in spite of many years’ pressure towards harmonisation.
One of the biggest questions this raises is whether having the right to read a work gives you to right to data mine it. In terms of literature, you might think the right to mine could be the less restrictive one, if anything, as the use and market for mining is so different from a linear reading mode. I don’t know of anyone who takes a nice big set of unstructured data (rather than a novel) with them on a beach holiday for fun, though I am sure such people exist (albeit with sand in their laptops). But no one is sure whether or if the rights to these modes of use will be harmonised, largely because the original rules and customs date to the era of print, and the tension between the unknown benefit of relaxing them and an unknowable future profit model that could emerge from data mining is unresolved.
This is a problem for literary scholars, but also for historians and others working with cultural data. Which leads me to the observation that the very beauty and utility of humanities data creates a two tier system of science, in which legal hurdles hinder research in some disciplines (those that co-own their data) or for some methodologies, but others. There are surely some medical or other data sets that are viewed as having equivalent market value to a best-seller, but even there, the mining model would be the only reading paradigm.
So how would data-driven research in other disciplines be different if the raw materials of their research were sold in airports, hung on the walls of galleries, or revered as the founding documents of a nation?