As discussed in the earlier post this week “How can data simultaneously mean what is fabricated and what is not fabricated?”, the term ‘raw data’ points toward an illusion, to something that is naturally or even divinely given, untouched by human hand or interpretive mind. The very organicism of the metaphor belies the work it has to do to hide the fact that progression form data to knowledge is not a clean one: knowledge also produces data, or, perhaps better said, knowledge also produces other knowledge. We need to get away from the idea of data as somehow ‘given’ – Johanna Drucker’s preference for the term ‘capta’ over ‘data’ is hugely useful in this respect.
Raw data is, however, an indicator for where we begin. Human beings have limited lifespans and limited capacities for experimentation and research, sensitive as we are to what John Guillory referred to as the ‘clock time’ of scholarship. Therefore to make scientific progress, we start on the basis of the work of others. Sometimes this is recognised overtly, as in citation of secondary sources. Sometimes it is perhaps only indirectly visible, as in the use of certain instruments, datasets, infrastructures or libraries, that themselves (to the most informed and observant) belie the conditions in which particular knowledge was born. In this way, ‘raw data’ merely refers to the place where our organisation started, where we removed what we found from the black box and began to create order from the relative chaos we found there.
Do big data environments make these edges apparent? Usually not. In the attempt to gain an advantage over ‘clock time,’ systems hide the nature, perhaps not of the chaos they began with, but of how that chaos came to be, whose process of creating ‘capta’ gave origin to it in the first place. Like the children’s rhyme about the ‘House that Jack Built’, knowledge viewed backwards toward its starting point can perhaps be seen recursing into data, but going further, that data itself emerges from a process or instrument that someone else created through a process that began elsewhere, with something else, at yet another, far earlier point. At our best, humanists cling to the provenance of their ideas and knowledge. Post-modernism has been decried by many as the founding philosophy of the age of ‘alternative facts’, as an abandonment of any possibility for one position to be right or wrong, true or untrue. In the post-truth world, any position can offend, and attempt at defence result in being labelled a ‘snowflake.’ But Lyotard’s work on The Post-Modern Condition wasn’t about this: it was about recognising the bias that creeps in to our assumptions once we start to generalise at the scale of a grand narrative, a truth. In this sense, we perhaps need to move our thinking from ‘raw data,’ to ‘minimally biased data,’ and from the grand narratives of ‘big data’ to the nuances of ‘big enough data.’ If we can begin speaking with clarity about that place where we begin, we will have a much better chance of not losing track of our knowledge along the way.