Common wisdom holds it that data are recorded facts and figures; as an example see Kroenke et Auer, Database Processing, slide 8. This is not astonishing if one considers the meaning of the term “raw data”. Most often it refers to unprocessed instrument data at full resolution; the output of a machine which has differentiated between signal and noise – as if the machine has not been conceived by human beings and designed according to needs defined by them. But the entanglement of data and facts refers to a mechanical understanding of objectivity, where instruments record signals which can easily be identified by humans as something that has been counted, measured, sensed. “Raw data” is seen as the output of that operation, as facts gained by experiment, experience or collection. Data products derived from these machine outputs seem to inherit their ‘factual’ status, even if processed to a high degree in order to enable modelling and analysis. Turning against this distinction between “raw” and “cooked” data and underlining data as a result of human intervention, Geoffrey Bowker has termed the phrase “’Raw data’ is an oxymoron”, which provided the title of a book edited by Lisa Gitelman (find the introduction here). But common sense is not this precise with differentiations; it keeps the aura of data as something always being pre-processed.
Data and facts are not the same. In a contribution to the anthology edited by Gitelman, Daniel Rosenberg explains the etymology of both terms as well as their differences. Facts are ontological, they depend on the logical operation true/false. When they are proven false, they cease to be termed “facts”. Data can have something in common with facts, namely their “out there-ness”, a reference apparently beyond the arbitrary relation between signifier and signified. “False data is data nonetheless”, Rosenberg puts it, and he points out that “the term ‘data’ servers a different rhetorical and conceptual function than do sister terms as ‘facts’ and ‘evidence’.” But what exactly is the rhetorical function of the term “data”? Rosenberg’s answer is that “data” designates something which is given prior to argument. Again, this brings the term “data” close to the term “fact”: In argumentation, both terms assume the task of a proof, of something that substantiates. In which settings is this rhetoric particularly relevant?
As Steve Woolgar and Bruno Latour have pointed out a generation ago, facts are social constructions, purified from the remainders of the process in which they were created: “the process of construction involves the use of certain devices whereby all traces of production are made extremely difficult to detect”, they wrote in “Laboratory Life”. There is a process at work which can be compared to ‘datafication’: In a laboratory, the term “fact” can simultaneously assume two meanings. At the frontier of science, the scientists themselves know about the constructed nature of statements and are aware of their relation to subjectivity and artificiality; and at the same time these statements are referred to as a thing “out there”, i.e. to objectivity and facts. And it is in the very interest of science, aiming at “truth effects”, to make the artificiality of factual statements forgotten, and, as a consequence, to have facts taken for granted. The analogy of the nature of these statements to “raw” and “cooked” data is obvious; “facts” and “data”. With respect to the latter, the process of division and classification of phenomena into numbers obscures ambiguity, conflict, and contradiction; the left-over of this process, “data”, are completely cleansed of irritating factors; and no trace remains of the purifying process itself. Therefore “data” deny their fabricated-ness; and in struggles for legitimacy of insight or the application for resources, they serve their purpose in the very same way as “facts” do.