When Christine Borgman (2015) mentions the term “native data” she is referring to data in its rawest form, with context information like communication artefacts included. In terms of the NASA’s EOSDIS Data Processing Levels, “native data” even precede level 0, meaning that no cleaning had been performed at all. Scientists who begin their analysis at this stage do not face any uncertainties about what this context information is. It is simply the recordings and the output of instruments, predetermined by the configuration of the instruments. NASA researchers may therefore count them lucky to obtain this kind of reliable context information.
Humanists’ and social scientists’ point of departure is quite different. Anthropologists for example would probably use the term “emic” for their field research data. “Emic” here stands in contrast to “etic” and has been derived from the distinction in linguistics between phonemics and phonetics: “The etic viewpoint studies behavior as from outside of a particular system, and as an essential initial approach to an alien system. The emic viewpoint results from studying behavior as from inside the system” (Pike 1967: 37). An example for the emic viewpoint might be the correspondences between pulses and organs in Chinese medical theory (see picture below) or the relation of masculinity to maleness in a particular cultural setting (MacInnes 1998).
The emic context then for Anthropologists depends on the particular cultural background of their research participants. Disassociated from this cultural background and transferred into an etic context, data may become incomprehensible. Take for example the Kosovo, a sovereign state from an emic point of view, but only recognized by 111 UN member states. In this transition from emic to etic context, the etic context obviously becomes an imposed context.
Applied to libraries, archives, museums and galleries, it might equally be important to know the provenance and original use, so to speak the emic context of the resources. What functions did the materials have for the author or creator? To know about the “experience-near” and not only the “experience-distant” meanings of materials would increase its information content and transparency. One could also say that this additional providing of “emic” metadata enables traceability to the source context and guarantees the credibility of the data. From an operational viewpoint that would nevertheless recreate the problem of standards and making data findable.
If we move up to the next level, metadata from each GLAM-institution could be said to be emic, according to the understanding of the data structure by the curators in that institution. Currently there are over hundred different metadata standards applied. Again, the aggregation of several metadata standards into a unified metadata standard creates the same problem – transfer from an emic (an institution’s inherent metadata standard) into an etic metadata standard.
So what is the solution? Unless GLAM-institutions are willing to accept an imposed standard there remains only the possibility of a mutual convergence and ultimately an inter-institutional consensus.
Borgman, Christine L. (2015) Big Data, Little Data, No Data. Scholarship in the Networked World. Cambridge: MIT Press.
MacInnes, John (1998) The end of masculinity. The confusion of sexual genesis and sexual difference in modern society. Buckingham: Open University Press.
Pike, Kenneth L. (1967) Language in Relation to a Unified Theory of the Structure of Human Behavior. The Hague: Mouton.
Featured image was taken from http://www.europeana.eu/portal/de/record/9200105/wellcome_historical_images_L0038821.html