Blog

KPLEX Presented at DH 2019 Conference

Jörg Lehmann and Jennifer Edmond were very pleased to have been given a chance to present some learnings from the KPLEX project to an engaged audience at the DH 2019 conference on 12th July 2019.  The paper was entitled “Digital Humanities, Knowledge Complexity and the Six ‘Aporias’ of Digital Research,” and explored a number of the cultural clashes we found between the perspectives in our interviews.  While DH was never a planned audience for our results, the response today convinced us that there is still much to mine from our interviews and insights!

The slides from the presentation can be viewed here.

ACDH LECTURE 4.1- What can Big Data Research Learn from the Humanities?

csm_events_ACDH_Lecture_4.1_e156aa4aa6

Jennifer Edmond
Director of the Trinity College Dublin Centre for Digital Humanities and Principal Investigator on the KPLEX Project

One of the major terminological forces driving ICT development today is that of ‘big data.’ While the phrase may sound inclusive and integrative, in fact, ‘big data’ approaches are highly selective, excluding, as they do, any input that cannot be effectively structured, represented, or, indeed, digitised. Data of this messy, dirty sort is precisely the kind that humanities and cultural researchers deal with best, however.  In particular, knowledge creation and information management approaches from the humanities shed light on gaps such as: the manner in which data that are not digitised or shared become ‘hidden’ from aggregation systems; the fact that data are human created, and lack the objectivity often ascribed to the term; and the subtle ways in which data that are complex almost always become simplified before they can be aggregated. Humanities insight also exposes the problematic discursive strategies that big data research deploys, strategies that can be seen reflected not only in the research outputs of the field, but also in many of the urgent challenges our digitised society faces.

The lecture is available to view here: https://www.youtube.com/watch?v=E2vdFBo9wB4

The Future of History

If you go to an archive today and look for a personal heritage, what would you expect? Notebooks, letters, photographs, calendars, the documentation of printed publications, drafts of articles or books with hand-written comments, and the like. But how about born-digital contents? In the best case, you will find a backup of the hard disk drive of the person you are researching on. The letters of earlier times have changed to eMails, SMS and WhatsApp messages, Facebook entries, Twitter posts; publications may have changed to online articles, PDF files, and blog contributions distributed all over the net. And that’s the point: What may have been part of somebody’s personal inheritance in paper format, may have nowadays become part of Big Data. Yes, Big Data: They do not consist only of incredibly large tables, with variables and columns filled with numbers; a good part of Big Data consists simply of text files with social media contents (Facebook, Twitter, blogs, and so on).

At first glance, this sounds astonishing. The ‘private’ character of personal heritage seems to have vanished, the proportion of content available in the public sphere seems to have grown. It seems. But this is not surprising; we are reminded of one of the most influential studies on this topic, Jürgen Habermas’ “Structural Transformation of the Public Sphere”. What Habermas analyses here are the constant changes and shifts of the border between private and public. His examination starts in the late 18th and early 19th century, with the formation of an ideal type of bourgeois discourse marked by what Habermas calls “Räsonnement”. This reasoning aims at arguing, but also, in its pejorative form, at grumbling. The study begins with bourgeois members of the public meeting in Salons, coffee houses, and literary round tables, pursuing reasoned exchange by contributing to journals, practicing subjectivity, individualism, and sentimentalism by writing letters and diaries destined either to be published (think of Gellert, Gleim, and Goethe) or to become part of personal heritage. Habermas draws long lines into the 20th century, where his book ends with the opposition between public and private characteristic of that time: Employment is part of the public space, while leisure time is dedicated to private activities; letters and lecture have become much less important, only the high bourgeoisie keeps their own libraries; mass media enhance the passivity of consumerism. This can also be read from personal heritages: The functional differentiation of a modern society created presumed experts for Räsonnement, like journalists, politicians, and publicists, who deliver opinion formation as a service, while editors and scientists professionalise the critique of politics. Habermas is overtly critical in view of the mass media and their potentials for manipulation since they reduce citizens to recipients without agency.

The last edition of Habermas’ book was printed in 1990. Since that time, a lot has changed, especially with the emergence of the internet. The border between public and private has been moved, and the societal-political commitment of citizens has changed. Social media grant an incredibly agency and empower citizens. Hyperdigital hipsters are working in cafés, co-working spaces or start-ups, without having the private leisure time characteristic of the 20th century. Digital media network people across large spaces and form new transnational collectives. Anthropologist Arjun Appadurai has spoken of “diasporic public spheres” in this respect – small groups of people discussing face to face in pubs have been transformed into “communities of sentiment” grumbling at politics. Formerly silent recipients have mutated into counter publics, the sentimental bourgeois has become an enraged citizen. Habermas wouldn’t have liked this development, since his ideal type of Räsonnement doesn’t fit with current realities, and what he overlooked – the existence of large parts of the society consisting of people who don’t participate in mass media discourses because they don’t want to – nowadays informs e.g. right-wing populism.

Facebook-Network
The Facebook network as a new public sphere

This latest transformation of the public sphere has consequences for archivists as well as historians. Consequently, archivists should regard social media contents as part of personal heritages and have thus to struggle with data management and storage problems. Historians (at least historians of the future) have to become familiar with quantitative analysis in order to e.g. examine Twitter-networks in order to determine the impact of the Alt-Right movement onto the presidential election in the U.S. Born-digital contents can therefore be seen as valuable parts of personal heritage. And coming from this point of view, there is certainly a lot that historians can contribute to discussions on Big Data.

 

Jürgen Habermas, The Structural Transformation of the Public Sphere: An Inquiry into a Category of Bourgeois Society. Cambridge: Polity 1989.

Arjun Appadurai, Modernity At Large: Cultural Dimensions of Globalization. Minneapolis: University of Minnesota Press 1996.

 

What the stories around data tell us

The strange thing about data is that they don’t speak themselves. They need to be embedded into an interpretation to become palatable and understandable. This interpretation may be an analytic account like the narrative synthesis told after performing a regression. It also may be a story in a more conventional sense, something like a success story of conquest, mastery, submission, or revelation which results out of the usual storytelling used in marketing.

The funny thing about data and stories is that it is easier to create a story out of data than extracting data out of a story told. It is easy to conceived of narratives built on top of data. Companies like Narrative Science create market reports or sporting reports automatically out of the data they receive on a daily basis. On the other hand, it is difficult to imagine to extract statistical data about a soccer game out of the up-to-the-minute scores of the same game presented on a website.

But data form a peculiar basis of stories. Think of the data which are collected when you visit a website – a typical basis of Big Data. These websites collect data on where you go, where you click, how long you stay there and so on; typically, they are behavioural data. What data scientist can get out of that are correlations; these data don’t allow to grasp the causal mechanisms behind the observed behaviour. They are not able to see the whole person behind the behaviour; thus they are e.g. not able to tell what the costumers feel during their visit on the website, why they reacted – and where the visitors see value in the offers they are presented. It is thus a reduction of the perspective which comes along with stories based on data, a reduction which maybe can’t be avoided since data are in themselves a product in a process of estrangement typical for capitalism. The narrative might attenuate or conceal the limitations of the data, but it will not be able to reach far beyond the restrictions imposed.

Market

But there is more about data presented in narratives than a mere reduction of perspective: Other than data collected in scientific disciplines like psychology and anthropology, which might enable representative statements about population groups, the results of Big Data analyses grant a shift in perspective. By performing classifications, groupings of people according to their preferences, assessing the creditworthiness of customers, etc., Big Data allow to view human beings from the perspective of a market. And in their ability to shape the offers presented on a website in real-time and adapt the pricing mechanisms according to the IP-address from where the websites are being accessed; in their ability to build up systems of gratification, rewarding actions of the users which are seen as opportune by the infrastructure, data grant a point of view onto customers which further strengthen commodification and economic governance. The fact that this point of view is equivalent to the perspective of the market becomes especially visible in the narratives accompanying these data.

Identities and Big Data

One of the amenities of Big Data (and the neural nets used to mine them) is its potential to reveal patterns of which we were not aware before. “Identity” in Big Data might either be a variable contained in the dataset. This implies classification, and a feature which cannot easily be determined (like some cis-, trans-, inter-, and alia-identities) might have been put into the ‘garbage’ category of the “other”. Or, identity might arise from a combination of features that was unknown beforehand. This was the case in a study which claimed that neural networks are able to detect sexual orientation from facial images. The claim did not go unanswered; a recent examination of this study by Google researchers exposed that differences in culture, rather than facial structures were the features responsible for the result. Therefore, features that can easily be changed – like makeup, eyeshadow, facial hair, or glasses – were underestimated by the authors of the study.

The debate between these data analysts exposes insights well-known to humanists and social scientists. Identities differ in context; depending on the situation in which she or he is, a person may say “I am a mother of three children”, “I am a vegan”, or “I am a Muslim”. In fields marked by strong tensions and polarizations, identity statements can come close to confessions, and it might be wise to carefully deliberate about whether it is opportune to either provide the one or the other answer: “I am a British”, “I am an European”.

It is not without irony that it is easy to list several identities that have been important throughout the past few centuries. Clan, tribe, nation, heritage, sect, kinship, blood, race – this is the typical stuff ethnographers and historians work on. Beyond the ones just named, identities like family, religion, culture, and gender are currently intensely debated in our postmodern, globalised world. Think of the discussions about cultural identities in a world characterized by migration; and think of gender identity as one of the examples which only recently has split itself into new forms and created a new vocabulary that tries to grasp the new changeableness: genderfluid, cisgender, transgender, agender, genderqueer, non-binary, two-spirit, etc.

Akhenaten
Androgynous portrayal of Akhenaten, the first ruler to introduce monotheism in Ancient Egypt

It is obvious that identity is not a stable category. This is the irony of identity – the promise of an identical self dissolves itself in time and space, and any trial to isolate, fix and homogenise an identity is doomed to failure. In postmodernity, identities are rather constructed out of interactions with the social environment – through constant negotiations, distinction and recognition, exclusion and belonging. Mutations and transformations are the expression of the tensions between vital elements that characterize identities. The path from Descartes’ “Cogito, ergo sum” to Rimbaud’s “I is another” is one of the best-known examples for such a transformative process.

Humanists and social scientists are experts in providing thick descriptions and large contexts in which identities can be located. They are used to relate the different resources to each other which feed into identities, and they are capable to build the contexts in which the relevant features and the relationships between them are embedded. In view of these potentials it is astonishing that citizens with such an academic background do not speak with confidence of the power of their methods and mix themselves into Big Data debates about such elusive concepts like “identities”.

Data. A really short history

“One man’s noise is another man’s signal.” Edward Ng

 

Data are “givens”. They are artefacts. They don’t pre-exist in the world, but come to the world because we, the human beings, want them to do so. If you would imagine an intimate relationship between humans and data, you would say that data do not exist independently from humans. Hunters and gatherers collect the discrete objects of their desire, put them in order, administer them, store them, use them, throw them away, forget about where they were stored and at times uncover those escaped places again, for example during a blitheful snowball fight. At that time data were clearly divided from non-data – edible things were separated from inedible ones, helpful items from uninteresting ones. This selectivity may have benefited from the dependency on shifting availability of resources in a wider space.

Later onwards, when mankind went out of the forests, things became much more complicated. With the advent of agriculture and especially of trade data became more meaningful. Furthermore, rulers became interested in knowledge about their sheep and therefore asked some of their accountants to keep record of their subordinates – the eldest census dates back to 700 B.C. In the times when mathematics became a bit more complicated from the 17th (probability) and 19th (statistics) century onwards, data about people, agriculture, and trade began to heap up and it became more and more difficult to distinguish between relevant data (“signal”) and irrelevant ones (“noise”). The distinction between these two simply became a matter of the question with which the data at hands were consulted.

With the advent of the industrial age, the concept of mechanical objectivity was introduced, and the task of data creation was delegated to machines which were constructed to collect the items in which humans were interested in. Now data were available in huge amounts, and the need for organizing and ordering them became even more pressing. It is over here, where powerful schemes came into force: Selection processes, categorizations, classifications, standards; variables prioritized as signal over others reduced to noise, thus creating systems of measurement and normativity intended to govern societies. They have been insightful investigated in the book “Standards and their stories. How quantifying, classifying, and formalizing practices shape everyday life.”*

It was only later, in the beginning of the post-industrial age, when an alternative to this scheme- and signal-oriented approach was developed by simply piling up everything that may be of any interest, a task also often delegated to machines because of their patient effortlessness. The agglomeration of masses presupposes that storing is not a problem, neither in spatial nor in temporal terms. The result of such an approach is nowadays called “Big Data” – the accumulation of masses of (mostly observational) data for no specific purpose. Collecting noise in the hope of signal, without defining what noise and what signal is. Fabricating incredibly large haystacks and assuming there are needles in it. Data as the output of a completely economised process with its classic capitalistic division of labour, including the alienation from their sources.

What is termed “culture” often resembles these haystacks. Archives are haystacks. The German Historical Museum in Berlin is located in the “Zeughaus”, literally the “house of stuff”,** stuffed with the remnants of history, a hayloft of history. Libraries are haystacks as well; if you are not bound to the indexes and registers called catalogues, if you are free to wander around the shelves of books and pick them out at random, you might get lost in an ocean of thoughts, in countless imaginary worlds, in intellectual universes. This is the fun of the explorers, conquerors and researchers: Never get bored through routine, always discover something new which feeds your curiosity. And it is here, within this flurry of noise and signal, within the richness of multisensory, kinetic and synthetic modes of access, where it becomes tangible that in culture noise and signal cannot be thought without the environment out of which they were taken, that both are the product of human endeavours, and that data are artefacts that cannot be understood without the context in which they were created.

 

*Martha Lampland, Susan Leigh Star (Eds.), Standards and their stories. How quantifying, classifying, and formalizing practices shape everyday life. Ithaca: Cornell Univ. Press 2009.
** And yes, it should be translated correctly as “armoury”.

Dr Jennifer Edmond on RTÉ Brainstorm

RTE Brainstorm

A piece by Dr. Jennifer Emond, Principal Investigator of the KPLEX project, Director of Strategic Projects in the Faculty of Arts, Humanities and Social Sciences Trinity College Dublin and co-director of the Trinity Centre for Digital Humanities, has been published on the RTÉ Brainstorm website.  In this insightful piece, Dr. Edmond addresses some of the challenges of talking about Big Data, an important theme currently being explored by the KPLEX project team.

You can read the full article here: https://www.rte.ie/eile/brainstorm/2018/0110/932290-the-problem-with-talking-about-big-data/