Data. A really short history

“One man’s noise is another man’s signal.” Edward Ng

 

Data are “givens”. They are artefacts. They don’t pre-exist in the world, but come to the world because we, the human beings, want them to do so. If you would imagine an intimate relationship between humans and data, you would say that data do not exist independently from humans. Hunters and gatherers collect the discrete objects of their desire, put them in order, administer them, store them, use them, throw them away, forget about where they were stored and at times uncover those escaped places again, for example during a blitheful snowball fight. At that time data were clearly divided from non-data – edible things were separated from inedible ones, helpful items from uninteresting ones. This selectivity may have benefited from the dependency on shifting availability of resources in a wider space.

Later onwards, when mankind went out of the forests, things became much more complicated. With the advent of agriculture and especially of trade data became more meaningful. Furthermore, rulers became interested in knowledge about their sheep and therefore asked some of their accountants to keep record of their subordinates – the eldest census dates back to 700 B.C. In the times when mathematics became a bit more complicated from the 17th (probability) and 19th (statistics) century onwards, data about people, agriculture, and trade began to heap up and it became more and more difficult to distinguish between relevant data (“signal”) and irrelevant ones (“noise”). The distinction between these two simply became a matter of the question with which the data at hands were consulted.

With the advent of the industrial age, the concept of mechanical objectivity was introduced, and the task of data creation was delegated to machines which were constructed to collect the items in which humans were interested in. Now data were available in huge amounts, and the need for organizing and ordering them became even more pressing. It is over here, where powerful schemes came into force: Selection processes, categorizations, classifications, standards; variables prioritized as signal over others reduced to noise, thus creating systems of measurement and normativity intended to govern societies. They have been insightful investigated in the book “Standards and their stories. How quantifying, classifying, and formalizing practices shape everyday life.”*

It was only later, in the beginning of the post-industrial age, when an alternative to this scheme- and signal-oriented approach was developed by simply piling up everything that may be of any interest, a task also often delegated to machines because of their patient effortlessness. The agglomeration of masses presupposes that storing is not a problem, neither in spatial nor in temporal terms. The result of such an approach is nowadays called “Big Data” – the accumulation of masses of (mostly observational) data for no specific purpose. Collecting noise in the hope of signal, without defining what noise and what signal is. Fabricating incredibly large haystacks and assuming there are needles in it. Data as the output of a completely economised process with its classic capitalistic division of labour, including the alienation from their sources.

What is termed “culture” often resembles these haystacks. Archives are haystacks. The German Historical Museum in Berlin is located in the “Zeughaus”, literally the “house of stuff”,** stuffed with the remnants of history, a hayloft of history. Libraries are haystacks as well; if you are not bound to the indexes and registers called catalogues, if you are free to wander around the shelves of books and pick them out at random, you might get lost in an ocean of thoughts, in countless imaginary worlds, in intellectual universes. This is the fun of the explorers, conquerors and researchers: Never get bored through routine, always discover something new which feeds your curiosity. And it is here, within this flurry of noise and signal, within the richness of multisensory, kinetic and synthetic modes of access, where it becomes tangible that in culture noise and signal cannot be thought without the environment out of which they were taken, that both are the product of human endeavours, and that data are artefacts that cannot be understood without the context in which they were created.

 

*Martha Lampland, Susan Leigh Star (Eds.), Standards and their stories. How quantifying, classifying, and formalizing practices shape everyday life. Ithaca: Cornell Univ. Press 2009.
** And yes, it should be translated correctly as “armoury”.

Big Data and Biases

Big Data – these are per se data which are too big to be inspected by humans. Their bigness has consequences: They are so large that typical applications to store, compute and analyse them are inappropriate. Often processing them is a challenge for a single computer; thus, a cluster of computers have to be used in parallel. Or the amount of data has to be reduced by mapping an unstructured dataset into a dataset where individual elements are key-value pairs; on a reduced selection of these key-value pairs mathematical analyses can be performed (“MapReduce”). Even though Big Data are not collected in response to a specific research question, their sheer largeness (millions of observations x of y variables) promises to provide answers relevant for a large part of a societies’ population. From a statistical point of view, what happens is that large sample sizes boost significance; the effect size is more important. However, on the other hand, large does not mean all; one has to be aware of the universe covered by the data. Statistical inference – conclusions drawn from data about the population as a whole – cannot easily be applied, because the datasets are not established in a way that ensures that they are representative. Therefore, bias in Big Data ironically may come from missing datasets, e.g. on those parts of the population which are not captured in the data collection process.

But biases may also arise in the process of analysing Big Data. This has also to do with the substantial size of the datasets; standard software may be inappropriate to handle it. Beyond parallel computing and MapReduce, the use of machine learning seems to provide solutions. Machine learning designates algorithms that can learn from and make predictions on data through building a model from sample inputs. It is a type of artificial intelligence in which the system learns from lots of examples; results – such as patterns or clusters – become stronger with more evidence. It is for this reason why Big Data and machine learning seem to go hand in hand. Machine learning can roughly be divided into A) analytic techniques which use stochastic data models, most often classification and regression in supervised learning; and B) predictive approaches, where the data mechanism is unknown, as it is the case with neural nets and deep learning. In both cases biases may be the result of the processing of Big Data.

A) The goal of statistical modelling is to find a model which allows to draw quantitative conclusions from data. It has the advantage of the data model being transparent and comprehensible by the analyst. However, what sounds objective (since it is ‘based on statistics’), neither needs to be correct (since if the model is a poor emulation of reality, the conclusions may be wrong), nor need it be fair: The algorithms may simply not be written in a manner which describes fairness or an even distribution as a goal of the problem-solving procedure. Machine learning then commits disparate mistreatment: the algorithm optimizes the discrimination for the whole population, but it is not looking for a fair distribution of this discrimination. ‘Objective decisions’ in machine learning can therefore be objectively unfair. This is the reason why Cathy O’Neill has called an algorithm “an opinion formalized in code”[1] – it does not simply provide objectivity, but works towards the (unfair) goals for which it written. But there is remedy; it is possible to develop mechanisms for fair algorithmic decision making. See for example the publications of Krishna P. Gummadi from the Max Planck Institute for Software Systems.

AlgorithmTanSteinbachKumarch4

Example of an algorithm, taken from: Pang-Ning Tan, Michael Steinbach, Vipin Kumar, Introduction to Data Mining, Boston 2006, p. 164.

B) In recent years, powerful new tools for Big Data analysis have been developed: Neural nets, deep learning algorithms. The goal of these tools is predictive accuracy; they are hardware-hungry and data hungry, but have their strength in complex prediction problems where it is obvious that stochastic data models are not applicable. Therefore, the approach is designed in another way here: What is observed is a set of x’s that go in and a subsequent set of y’s that come out. The challenge is to find an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y. The goal is to have the algorithm produce results with a strong predictive accuracy. The focus does not lie with the model by which the input x is transformed into the output y; it does not have to be a stochastic data model. Rather, the model is unknown, complex & mysterious; and irrelevant. This is the reason why accurate prediction methods are addressed as complex “black boxes”; at least with neural nets, ‘algorithms’ are seen as a synecdoche for “black box”. Other than it is the case with stochastic models, the goal is not interpretability, but accurate information. And it is here, on the basis of an opaque data model, where neural nets and deep learning extract features from Big Data and identify patterns or clusters which have been invisible to the human analyst. It is fascinating to see that humans don’t decide what those features are. The predictive analysis of Big Data can identify and magnify patterns hidden in the data. This is the case with many recent studies, like, for example, the facial recognition system recognizing ethnicity which has been developed by the company Kairos or the Stanford study inferring sexual orientation by analysing people’s faces. What comes out here is that the automatic feature extraction amplifies human bias. A lot of the talk about “biased algorithms” is a result out of these findings. But are the algorithms really to blame for the bias, especially in the case of machine learning systems with a non-transparent data model?

This question leads us back again to Big Data. There are at least two possible ways in which the data used predetermine the outcomes: The first is Big Data with built-in bias which is then amplified by the algorithm. Simply go to the Google image search and perform a search either for the words “CEO” or “cleaner”. The second is the difference between the data sets used as training data for the algorithm and the data analysed subsequently. If you don’t have, for example, African American faces in a training set on facial recognition, you simply don’t know how the algorithm will behave when applied to images with African American faces. Therefore, the appropriateness and the coverage of the data set is crucial.

The other point lies with data models and the modelling process. Models are always contextual, be they stochastic models with built-in assumptions about how the world works; or be they charged with context during the modelling process. This is why we should reflect on the social and historical contexts in which Big Data sets have been established; and the way our models and modelling processes are being shaped. And maybe it is also timely to reflect on the term “bias”, and to recollect that it implies an impossible unbiased ideal …

 

[1] Cathy O’Neil, Weapons of Math Destruction. How Big Data increases inequality and threatens Democracy, New York: Crown 2016, p.53.

The K-PLEX project on the European Big Data Value Forum 2017

Mike Priddy (DANS, 2nd from right in the image) represented the K-PLEX project at the European Big Data Value Forum 2017 Conference in a panel on privacy-preserving technologies. Read here about the statements he made in answering to three questions posed.

IMG_0603

Question 1: There is an apparent trade-off between big data exploitation and privacy – do you agree or not?

  • Privacy is only one part of Identity. There needs to be respect for the individual’s right to build their identity upon a rich and informed basis.
  • The right not to know should also be considered. People have a right to their own levels of knowledge
  • Privacy is broader than the individual. Confidential data exists in and can affect: family, community, & company/organisations. The self is relational, it is not individual, it produces social facts and consequences.
  • Trust in data use & third party use – where should the accountability be?
  • There is the challenge of transparency versus accountability; just making all data available may obfuscate the accountability.
  • Accountability versus responsibility? Where is the ethical responsibility lie with human & non-human actors?
  • Anonymisation is still an evolving ‘science’ – the effectiveness of anonymising processes is is not always well and broadly understood. Aggregation may not give the results that users want or can use, but may protect the individual but not necessarily for a community or family.
  • Anonymity maybe an illusion; we don’t understand how minimal the data may need to be in order to expose identity. DoB, Gender & Region is enough to be disclosive for the majority of a population.
  • individuals, in particular young or vulnerable individuals, may not be in a position to defend themselves.
  • This means that big data may need to exclude communities & people with niche problems
  • Black boxes of ML & NNets don’t allow people to understand or track use or misuse or misinformation – wrong assertions being made: you cannot give informed consent under these conditions.
  • IOT and other technologies (facial recognition) mean that there is possibly no point at which informed consent can be given.

Strategies for meeting these issues:

  • There are well established strategies to deal with disclosure of confidential data in the Social Sciences and Official Statistics: such as output checking, off the grid access, remote execution (with testable data), secure rooms etc. Checks and balances are needed (a pause) before it goes out – this is a part of oversight and governance.
  • Individuals should be able to see when these processes are triggered, and decide if it is disclosive and whether that is appropriate.
  • More information about how data is used, shared, processed must be made available to the data creator (in a way they can use it)
  • meeting ISO 27001 standard in your data handling and procedures within your organisation is a good start.

Question 2: Regarding the level of development, privacy preserving big data technologies still have a long way to go – do you agree or not?

  • Biases are baked in. There isn’t enough differentiation between kinds of data: mine, yours, raw, cleaned, input, output – data is seen as just data and processed without narrative or context. We need not privacy by design, we need humanity at the centre of design and respect human agency.
  • Too often we only are concerned about privacy when it becomes a problem: privacy/confidentiality is NOT an obsolete concept.

Question 3: Individual application areas differ considerably with regard to the difficulty of meeting the privacy requirements – do you agree or not?

  • The problem is the way the question is formulated. By looking at application areas we are basically saying the problem is superficial. It is not. It is fundamental.
  • It has become very hard to opt out of everything. We cannot cut all of our social ties because of network effects.
  • Technology is moving faster than society can cope with and understand how data is being used. Not a new phenomena, we can see similar challenges in the historical record.
  • Privacy needs to be understood as a public good; there must be the right to be forgotten, but also right not to be recorded.
  • Data citizenship is needed: Citizens need to be involved enough & to be able to make better decisions about providing confidential/personal data & what happens to their data. What it means and what happens when you fill in that form

 

On the aura of Big Data

People who know very little about technology seem to attribute an aura of “objectivity” and “impartiality” to Big Data and analyses based on them. Statistics and predictive analytics give the impression, to the outside observer, of being able to reach objective conclusions based on massive samples. But why exactly is that so? How has it come that a societal discourse has ascribed that certain aura to Big Data analyses?

Since most people conceive of Big Data as tables filled with numbers which have been collected by machines observing human behavior, there are at least two points intermingled in this peculiar aura of Big Data: The belief that numbers are impartial and preinterpretive, and the conviction that there exists something like mechanical objectivity. Both concepts have a long history, and it is therefore wise to consult cultural historians and historians of science.

With respect to the claim that numbers are theory-free and value-free, one can consult the book “A History of the Modern Fact”[1] by Mary Poovey. Poovey traces the history of that modern epistemological assumption that numbers are free of an interpretive dimension, and she points to the story of how description came to seem separate from interpretation. In analyzing historical debates about induction and by studying authors such as Adam Smith, Thomas Malthus, and William Petty, Poovey points out that “Separating numbers from interpretive narrative, that is, reinforced the assumption that numbers were different in kind from the analytic accounts that accompanied them.” (XV) If nowadays many members of our societies imagine that observation can be separated from analysis and numbers guarantee value-free description, this is the result of the long historical process examined by Poovey. But seen from an epistemological point this is not correct, because numbers are interpretive – they embody theoretical assumptions about what should be counted, they depend on categories, entities and units of measurement established before counting has begun, and they contain assumptions on how one should understand material reality.

The second point, mechanical objectivity, has been treated by Lorraine Daston and Peter Galison in their book on “Objectivity”; it contains a chapter of the same name.[2] Daston and Galison focus on photography as a primary metaphor for the objectivity ascribed to a machine. Alongside this example, they describe mechanical objectivity as “the insistent drive to repress the willful intervention of the artist-author, and to put in its stead a set of procedures that would, as it were, move nature to the page through a strict protocol, if not automatically.” (121) Both authors see two intertwined processes at work: On the one hand the separation of the development and activities of machines from the human beings who conceived them, with the result that machines were attributed freedom from the willful interventions that had come to be seen as the most dangerous aspects of subjectivity. And on the other hand the development of an ethics of objectivity, which called for a morality of self-restraint in order to refrain researchers from intervention and interferences like interpretation, aestheticization, and theoretical overreaching. Thus machines – be they cameras, sensors or electronic devices – have become emblematic for the elimination of human agency.

If the aura of Big Data is based on these conceptions of an “impartiality” of numbers and data collected by “objectively” working machines, there remains little space for human agency. But this aura proves of a false consciousness, the consequences of which can easily be seen: If analyses based on Big Data are taken as ground truth, it is no wonder that there is no space being opened up for a public discussion, for decisions made independently by citizens, and for a democratically organized politics, where the processes in which Big Data play an important role are being shaped actively.

[1] Mary Poovey, A History of the Modern Fact. Problems of Knowledge in the Sciences of Wealth and Society, Chicago / London: The University of Chicago Press 1998.

[2] Lorraine Daston, Peter Galison, Objectivity, New York: Zone Books 2007.

Statistical Modeling: The Two Cultures

In this article Leo Breiman describes two approaches in statistics: One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.

Statisticians in applied research consider data modeling as the template for statistical analysis and focus within their range of multivariate analysis tools on discriminant analysis and logistic regression in classification and multiple linear regression in regression. This approach has the plus that it produces a simple and understandable picture of the relationship between the input variables and response. But the assumption that the data model is an emulation of nature is not necessarily right and can lead to wrong conclusions.

The algorithmic approach uses neural nets and decision trees; predictive accuracy as criterion to judge the quality of the results of analysis. This approach does not apply data models to explain the relationship between input variable x and output variable y, but treats this relationship as a black box. Hence the focus is on finding an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y. While this approach has seen major advances in machine learning, it lacks interpretability of the relationship between prediction and response variables.

This article has been published in 2001, when the word “Big Data” was not yet in everybody’s mouth. But by shaping two different cultures to analyzing data and balancing pros and cons of each approach, it makes the differences of big data analysis in contrast to stochastic data models understandable even to laymen.

Leo Breiman, Statistical Modeling: The Two Cultures. In: Statistical Science, Vol. 16 (2001), No. 3, 199-231. Freely vailable online here.

On assistance and partnership with artificial intelligence

After mobile devices and touchscreens, personal assistants will be “the next big thing” in the tech industry. Amazon’s Alexa, Microsoft’s Cortana, Google Home, Apple’s HomePod  – all these voice-controlled systems come to your home, driven by artificial intelligence, ready for dialogues. Artificial intelligence currently works bests when provided with clear tasks, where clear goals are defined. That can be a goal defined by the user (“Please turn down the music, Alexa!”), but generally the goals of the companies offering personal assistants dominate: They want to sell. And there you find the differences between these personal assistants: Alexa sells the whole range of products marketed by Amazon, Cortana eases access to Microsoft’s soft- and hardware, Google Home has its strengths with smart home devices and the internet of things, and Apple’s HomePod … well, urges you into the hall of mirrors which has been created by Apple’s Genius and other flavour enhancers.

Beyond well-defined tasks, artificial intelligence is bad at chatting and assisting. If you are looking for a partner, for someone to talk to, the predefined goals are missing. AI lacks the world knowledge needed for such a task, nor is it capable to provide for the appropriateness of answers in a conversation or for the similarity of mindsets which is the basis of friendship.

But this is exactly what is promised by the emotion robot “Pepper”. This robot saves the emotions it collects from its human interaction partners on a shared server, a cloud. All of the existing Pepper robots are connected to this cloud. This way the robots, which are already autonomous, collectively “learn” how to improve their emotional reactions. Their developers also work with these data.

If you think through the idea of “Pepper”, you have to ask yourself to which end this robot should serve – as a replacement of a partner for human beings, caring for their emotional well-being? In which way is this conceived of? How does a robot know how to contribute to human well-being? Imagine of a human couple, where he is a choleric, and her role is to constantly quieten him down (contrapuntal approach). Or another couple which is constantly quarrelling, he shouts at her, and she yells back; this couple judges this to be the normal state and their quarrelling as an expression of well-being (homeopathic approach). Can a robot decide which ‘approach’ is the best? Simply imagine what would happen in a scenario where you have a person – we call him ‘Donald’ – who buys a new emotion robot – whom we call ‘Kim’. Certainly this is not the kind of world we’re looking for, isn’t it?

With personal assistants, it seems to be a choice between the devil and the deep blue sea: Either you are being reduced to a consumer; or you’ll be confronted with some strange product without openly defined goals, with which you can’t exchange at eye level. So the best choices we have is to either abstain from using these AIs; or to participate in civil society dialogues with tech companies on policy debates about the use of AI.

Analogue secrets

Annual leaves and holidays seem to support regressive impulses: Nearly unvoluntarily one stops in front of a souvenir shop to inspect rotating displays with postcards on it. They definitely have a charm of their own – small-sized cardboards with the aura of authenticity, indicating in a special way that “I was here”, as if there weren’t enough selfies and WhatsApp-posts to support that claim. Quickly written and quickly sent, these relicts of a faraway time in which messages were handwritten, postcards seem to provide for a proof that someone has really been far away and happily came back, enriched by experiences of alterity.

This aura of the analogue, which provides a short insight into the inner life of a person distantly on his route, is being exploited by an art project since many years. Postsecret.com provides post office boxes in many countries of the world, where people can send in their postcard, anonymously revealing a personal secret. A simple and ingenious idea: One can utter something which can not be told by word of mouth; nor could it be written down, because one would need an addressee who would draw his conclusions. But Postsecret.com makes this secret public, and maybe the cathartic effect (and affect) to finally got rid of something which was a heavy burden can be enjoyed in anonymity.

Can we conceive of a digital counterpart of this analogue art project? Only if there are still users there who believe in anonymity in the net. But it is true that literally every activity in the net leaves traces – and why should I reveal secrets in the net and thus become in one way or another susceptible to blackmail?

Praise for the good old hand-written postcard of confidence: Every mailing is an original, every self-made postcard is unique.

Patina pleases

Everything new is blank. New things display their integrity and an undestroyed, immaculate surface. Design objects are icons of youth and timelessness. They demand attention in a special way, because their newness is always at risk. Traces like scratches rob them of their stainlessness and indicate a loss of aura. The ageing of these objects triggers a certain horror in their owners, because it shows the passing of newness and a loss of control. Consumer electronics is a particularly instructive example here, and this is in spite of a miraculous ability of electronic content – the ability NOT to age.

But ageing is precious. Aged objects, which can be found on flea markets, in museums or antique shops are often seen as precious. They represent the past, time, and history, and we appreciate the peculiarities of their surface, which has grown over decades. We are not disturbed by dust, grease, grime, wax, scratches, or cracks – on the contrary, the patina of objects represents their depth and their ability to exhibit their own ageing process. The attention of the observer focuses on the materiality of the object, the patina of the surface and the deformations gained in time mark their singularity and individuality. To be precise: The fascination of the observer focuses more on the signs of ageing, rather than on the object itself.

schoolbag

Schoolbag from a flu market

In the digital world, we don’t find comparable qualities. Ageing would mean that data have become corrupt, unreadable, unusable and therefore worthless. For objects digitized by cultural heritage institutions, this is a catastrophe; it means loss. With consumer electronics, it’s similar: A smartphone is devalued by scratches. No gain in singularity is noted. With software, it is even worse. Software ageing is a phenomenon, where the software itself keeps its integrity and functionality, but if the environment, in which this software works, or the operating system changes, the software will create problems and mistakes, and sooner or later it will become dysfunctional. Ageing here means an incapability to adapt to the digital environment, and surprisingly this happens without wear or deterioration, since it is not data corruption which causes this ageing. In this respect, the process of software ageing can be compared to the ageing of human beings: They drag behind time or are uncoupled of the social world they live in; they lose connectivity, the ability of actualisation, and the skill to exchange with their environment.

With analogue objects, this is not the case. They provoke sensual pleasures without reminding the observer of the negative aspects of human ageing. Even if they have become dysfunctional and useless, they keep the dignity and aura of time, inscribed into their body and surface.  They keep their observers at a beneficial distance, which opens up space for imagination and empathy. The observer is free to visualise to himself the history of these objects and their capability to endure long time distances without vanishing – certainly a faculty which human beings do not dispose of. What remains are the characteristics of dignified ageing. While the nasty implications of ageing are buried in oblivion, analogue objects evoke a beauty of ageing.

Why Messiness Matters

As always in a field where different conceptions are present, there exist differing understandings of what ‘tidy’ or ‘clean’ data on the one hand, and ‘messy’ data on the other might be.

On a very basic level, and coming from the notion of data arranged in tables, Hadley Wickham has defined ‘tidy data’ as those where each variable is a column, each observation is a row, and each type of observational unit is a table. Data not following these rules are understood as messy. Moreover, data cleaning is a routine procedure in statistics and private companies applied to the data gathered: judging the number and importance of missing values, correcting obvious inconsistencies and errors, normalization, deduplication etc. Common examples are the use of non-existing or incorrect postal codes, a number which wrongly indicates an age (in comparison to birth date), the conversion of names from “Mr. Smith” to “John Smith”, etc.

In the sciences, the understanding of ‘messy’ data largely depends on what is understood as ‘signal’ and what is seen as ‘noise’. Instruments collect a lot of data, and only the initiated are able to distinguish between them. Cleaning data and thus peeling out the relevant information in order to receive scientific facts is also a standard procedure here. In settings where the data structure is human-made rather than technically determined, ‘messy’ data remind scientists to reassess whether the relationship between the (research) question and the object of research is appropriate; whether the classifications used to form variables are properly conceived of or whether they inappropriately limit the phenomenon under question; and what kind of data can be found in that garbage category of the ‘other’.

In the humanities, this is not as easily the case as it is with the sciences. Two recent publications provide for examples. Julia Silge’s and David Robinson’s book “Text Mining with R” (2017) bears the subtitle “A tidy approach”. Very much like Wickham, they define ‘tidy text’ as “a table with one-token-per-row”. Brandon Walsh and Sarah Horowitz present in their “Introduction to Text Analysis” (2016) a more differentiated approach to an understanding of what ‘cleaning’ of ‘messy’ data might look like. They introduce their readers to the usual problems of cleaning dirty OCR; the standardization and disambiguation of names (their well-chosen example is Arthur Conan Doyle, who was born as Arthur Doyle, but used one of his given names as addendum to his last name); and the challenges posed by metadata standards. All that seems easy stuff at first glance, but think of Gothic types (there exist more than 3.000 of them), pseudonyms or heteronyms, or camouflage printings published during the inquisition or under political repression. Now you can imagine how hard it can be to keep your data ‘clean’.

And there is, last but not least, another conception of ‘messiness’ in the humanities. It lies in the specific cultural richness, or polysemy, or voluptuousness of the data (or the sources, the research material) under question: A text, an image, a video, a theater play or any other object of research can be interpreted from a range of viewpoints. Humanists are well aware of the fact that the choice of a theory or a methodological approach – the ‘grid’ which provides order to the chaos of what is being examined – never provides an exhausting interpretation. It is the ‘messiness’ of the data under consideration which provides the foundation of alternative approaches and research results, which is responsible for the resistance to interpretation (and, with Paul de Man, to theory) – and which continuously demands an openness towards seeing things in another way.

Critical Data Studies: A dialog on data and space

This article has been published by Craig M Dalton, Linnet Taylor, and Jim Thatcher. ‘Critical Data Studies: A dialog on data and space’. In: In: Big Data & Society, Vol 3 (2016), Issue 1, pp.1—9.

Abstract:
In light of recent technological innovations and discourses around data and algorithmic analytics, scholars of many stripes are attempting to develop critical agendas and responses to these developments (Boyd and Crawford 2012). In this mutual interview, three scholars discuss the stakes, ideas, responsibilities, and possibilities of critical data studies. The resulting dialog seeks to explore what kinds of critical approaches to these topics, in theory and practice, could open and make available such approaches to a broader audience.

The article is available online at: http://journals.sagepub.com/doi/abs/10.1177/2053951716648346