Blog

Ways of Being in a Digital Age.

I’m just back from a few days in Liverpool, where I attended the “Ways of Being in a Digital Age” review conference at the University of Liverpool. This was the TCD working group’s first trip abroad for KPLEX dissemination, and my first ever trip to Liverpool.

UoL’s “Ways of Being in a Digital Age” was a a massive scoping review on how digital technology mediates our lives, and how technology and social change co-evolve and impact on each other. This conference came at the conclusion of the project, and invited participation in the form of papers that addressed any of the seven project domains: Citizenship and politics, Communities and identities, Communication and relationships, Health and wellbeing, Economy and sustainability, Data and representation, Governance and security. Naturally we hopped right into the “Data and Representation” category.

I was presenting a paper I co-wrote with Jennifer (Edmond, KPLEX’s PI) and, like most of my KPLEX activities thus far, I also used the platform as an opportunity to include as many funny data memes as I could reasonably fit into a 20 minute Powerpoint presentation. Which, by the way, is A LOT.

Our paper was titled “Digitising Cultural Complexity: Representing Rich Cultural Data in a Big Data environment” and in it we drew on many of the issues we’ve discussed thus far in the blog, such as data definitions, on the problems brought about by having so many different types of data, all classified using the same term (data), on data and surveillance, data and the humanities, and the “aura” of big data and how the possibilities of big data are manipulated and bumped up, so that it seems like an infallible “cure all,” when in fact it is anything but. And most importantly, on why complexity matters, and what happens when we allow alternative facts to take precedence over data.

The most exciting thing (from my perspective) was that we got to talk about some of our initial findings, findings based off of interviews I conducted with a group of computer scientists who very generously gave me their some of their time over the summer, and a more recent data mining project that is still underway, but that is producing some really exciting results. After all this desk research and talk about data over the last 9 months or so, the KPLEX team as a unit are in the midst of collecting data of our own, and that’s exciting. Below is a photo montage of my experience of the WOBDA conference, which mainly consists of all the different types of coffee I drank while there, along with some colossal pancakes I had for breakfast. I also acquired a new moniker, from the hipster barista in a very lovely coffee shop that I frequented twice during my two day stay. On the receipt, the note she wrote to find me was “Glasses lady.” 🙂

IMG_4444

To err is human – but also computers can make mistakes

Imagine an automated rating of CVs in order to decide whom to grant a scholarship or which job candidate to hire. This is not science-fiction. Big private companies increasingly rely on such mechanisms to make hiring decisions. They analyse the data of their employees to find the underlying patterns for success. The characteristics of job candidates are then matched with those of successful employees and the system recommends those candidates with most similar characteristics. Much less time and effort is needed to choose the “right” candidates from a pool of promising applicants. Certainly, the human resources department has to reflect on what characteristics to choose and how to combine and weight them, but the recommendations based on the analysis of big data seem to be very efficient.

Making automatic algorithm-produced predictions about one individual person by analyzing the information from many people is still problematic in several ways. First of all, it requires inter alia large datasets to avoid bias. Second, the demand for standardized CVs implies that non-normative ways of presenting oneself are excluded a priori. Third, assume that the characteristics of high achievers change by time. The system will continue (at least for some time) to formulate recommendations based on past experiences. The static model of prediction will be unable to detect potential innovative candidates who have divergent experiences and skills. It thus discriminates against individuals with non-standard backgrounds and motivations. A last problem is that all the data available to the model are based on the people who have been accepted in the first place and who have proven successful or unsuccessful thereafter. Nothing is known about the career paths of the applicants who had been rejected.

“In this context, data – long-term, comparable, and interoperable – become a sort of actor, shaping and reshaping the social worlds around them” (Ribes and Jackson 2013: 148). Taken from an ecological research with stream chemistry data this statement applies equally to the problem of automatic recommendation systems.

Even worse, not only the CV but the footprint one leaves in cyberspace might serve as the basis of decision-making. The company Xerox used data it has mined of their (former) employees to define the criteria for hiring new staff for its 55,000 call-centre positions. The applicants’ data gained from the screening test were compared with the significant, but sometimes unexpected criteria detected so far. In the case of Xerox for example “employees who are members of one or two social networks were found to stay in their job for longer than those who belonged to four or more social networks”.

To-err-is-human1

Whether the social consequences of these new developments can be attributed to humans or also computers is highly controversial. Luciano Floridi (2013) makes the point that we should differentiate between the accountability of (artificial) agents and the responsibility of (human) agents. Does the algorithm discussed above qualify as an agent? Floridi would argue yes, because “artificial agents could have acted differently had they chosen differently, and they could have chosen differently because they are interactive, informed, autonomous and adaptive” (ibid. 149). So even if “it would be ridiculous to praise or blame an artificial agent for its behavior, or charge it with a moral accusation” (ibid. 150), we must acknowledge that artificial agents as transition systems interact with their environment, that they can change their state independently and that they are even able to adopt new transition rules by which they change their state.

The difference between accountability and responsibility should be kept in mind, so that attempts to delegate responsibility to artificial agents can be uncovered. In case of artificial agents malfunctioning, the engineers who designed them are requested to re-engineer them to make sure they no longer cause evil. And in the case of recruitment decisions companies should be very careful about how to proceed. There is no single recipe for success.


Floridi, Luciano (2013) The Ethics of Information. Oxford: Oxford University Press.

Ribes, David/ Jackson, Steven J. (2013) Data Bite Man: The Work of Sustaining a Long-Term Study. In: Lisa Gitelman, “Raw Data” Is an Oxymoron. Cambridge: The MIT Press. 147-166.

Featured image was taken from: https://cdn.static-economist.com/sites/default/files/images/print-edition/20170610_FND000_0.jpg

On the aura of Big Data

People who know very little about technology seem to attribute an aura of “objectivity” and “impartiality” to Big Data and analyses based on them. Statistics and predictive analytics give the impression, to the outside observer, of being able to reach objective conclusions based on massive samples. But why exactly is that so? How has it come that a societal discourse has ascribed that certain aura to Big Data analyses?

Since most people conceive of Big Data as tables filled with numbers which have been collected by machines observing human behavior, there are at least two points intermingled in this peculiar aura of Big Data: The belief that numbers are impartial and preinterpretive, and the conviction that there exists something like mechanical objectivity. Both concepts have a long history, and it is therefore wise to consult cultural historians and historians of science.

With respect to the claim that numbers are theory-free and value-free, one can consult the book “A History of the Modern Fact”[1] by Mary Poovey. Poovey traces the history of that modern epistemological assumption that numbers are free of an interpretive dimension, and she points to the story of how description came to seem separate from interpretation. In analyzing historical debates about induction and by studying authors such as Adam Smith, Thomas Malthus, and William Petty, Poovey points out that “Separating numbers from interpretive narrative, that is, reinforced the assumption that numbers were different in kind from the analytic accounts that accompanied them.” (XV) If nowadays many members of our societies imagine that observation can be separated from analysis and numbers guarantee value-free description, this is the result of the long historical process examined by Poovey. But seen from an epistemological point this is not correct, because numbers are interpretive – they embody theoretical assumptions about what should be counted, they depend on categories, entities and units of measurement established before counting has begun, and they contain assumptions on how one should understand material reality.

The second point, mechanical objectivity, has been treated by Lorraine Daston and Peter Galison in their book on “Objectivity”; it contains a chapter of the same name.[2] Daston and Galison focus on photography as a primary metaphor for the objectivity ascribed to a machine. Alongside this example, they describe mechanical objectivity as “the insistent drive to repress the willful intervention of the artist-author, and to put in its stead a set of procedures that would, as it were, move nature to the page through a strict protocol, if not automatically.” (121) Both authors see two intertwined processes at work: On the one hand the separation of the development and activities of machines from the human beings who conceived them, with the result that machines were attributed freedom from the willful interventions that had come to be seen as the most dangerous aspects of subjectivity. And on the other hand the development of an ethics of objectivity, which called for a morality of self-restraint in order to refrain researchers from intervention and interferences like interpretation, aestheticization, and theoretical overreaching. Thus machines – be they cameras, sensors or electronic devices – have become emblematic for the elimination of human agency.

If the aura of Big Data is based on these conceptions of an “impartiality” of numbers and data collected by “objectively” working machines, there remains little space for human agency. But this aura proves of a false consciousness, the consequences of which can easily be seen: If analyses based on Big Data are taken as ground truth, it is no wonder that there is no space being opened up for a public discussion, for decisions made independently by citizens, and for a democratically organized politics, where the processes in which Big Data play an important role are being shaped actively.

[1] Mary Poovey, A History of the Modern Fact. Problems of Knowledge in the Sciences of Wealth and Society, Chicago / London: The University of Chicago Press 1998.

[2] Lorraine Daston, Peter Galison, Objectivity, New York: Zone Books 2007.

New Wine into Old Wineskins

The communication of scientific outputs or in other words the way narratives relate to data has received much attention in previous KPLEX posts. Questions such as “Do the narratives elaborate on the data, create narrative from the data, or do the narratives reveal the latent richness inherent in the data?” have been raised. These fundamental questions touch upon the very heart of the scientific enterprise. How do we try to grasp the complexity of a phenomenon and how do we translate our insights and findings into clear language?

Debates in anthropology might be revealing in this regard. The “reflexive turn” since the 1970s led anthropologists to ask themselves if it was possible to create an objective study of a culture when their own biases and epistemologies were inherently involved. So far they had produced some kind of “realist tale” with a focus on the regular and the junction of observations with standard categories. Only those data had been allowed that supported the analysis; the underanalyzed or problematic had been left out. Anthropologists’ worries and efforts had revolved around the possible criticism that their work was “’opinion, not fact’, ‘taste, not logic’, or ‘art, not science’” (Van Maanen 1988: 68). Then a plentitude of new tales emerged that openly discussed the accuracy, breadth, typicality, or generality of the own cultural representations and interpretations. Put under headings like “confessional tale”, “impressionist tale”, “critical tale”, “literary tale” or “jointly told tale” all these kinds of narratives open up new ways of description and explanation. Issues such as serendipity, errors, misgivings and limiting research roles for example are taken up by confessional writers (Karlan and Appel 2016).

How far has this discussion progressed in the sciences? A differentiation analogue to the one between “real events (Geschichte) and the narrative strategies (Historie) used to represent, capture, communicate, and render these events meaningful” has to some extent taken place. Still the process of constructing scientific facts often happens in a black box and is not revealed to the reader, as the famous study by Bruno Latour and Steve Woolgar has shown. Like in the humanities also in the sciences different kinds of statements – ranging from taken-for granted facts through tentative suggestions and claims to conjectures – contribute to the establishment of ”truth”. The combination of researchers, machines, “inscription devices”, skills, routines, research programs, etc. leads to the “stabilisation” of statements: “At the point of stabilisation, there appears to be both objects and statements about these objects. Before long, more and more reality is attributed to the object and less and less to the statement about the object. Consequently, an inversion take place: the object becomes the reason why the statement was formulated in the first place. At the onset of stabilisation, the object was the virtual image of the statement; subsequently, the statement becomes the mirror image of the reality ‘out there’” (Latour and Woolgar 1979: 176 f.).

New wine in old skins

So with regard to the sciences the questions that had been raised before: “Is it possible for a narrative to become data-heavy or data-saturated? Does this impede the narrative from being narrative?” have to be negated. Discursive representations are always implicated when conveying some version of truth. In terms of reflexivity there is still some room for improvement, e.g. putting the focus not only on the communication of startling facts but also on non-significant results. This would certainly help the practitioners of science to get to know better the scope and explanatory power of their disciplinary methods and theories. Hope remains that unlike in the parable of new wine in old wineskins the sciences will stand these changes and not burst.


Karlan, Dean S./ Appel, Jacob (2016) Failing in the field: What we can learn when field research goes wrong. Princeton: Princeton University Press.

Latour, Bruno/ Woolgar, Steve (1979) Laboratory Life: The Construction of Scientific Facts. Princeton: Princeton University Press.

Van Maanen (1988) Tales of the Field: On Writing Ethnography. Chicago: The University of Chicago Press.

Featured image was taken from: https://i.pinimg.com/originals/22/bc/0a/22bc0a8c4573701d9df70f51a971388a.jpg

Statistical Modeling: The Two Cultures

In this article Leo Breiman describes two approaches in statistics: One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.

Statisticians in applied research consider data modeling as the template for statistical analysis and focus within their range of multivariate analysis tools on discriminant analysis and logistic regression in classification and multiple linear regression in regression. This approach has the plus that it produces a simple and understandable picture of the relationship between the input variables and response. But the assumption that the data model is an emulation of nature is not necessarily right and can lead to wrong conclusions.

The algorithmic approach uses neural nets and decision trees; predictive accuracy as criterion to judge the quality of the results of analysis. This approach does not apply data models to explain the relationship between input variable x and output variable y, but treats this relationship as a black box. Hence the focus is on finding an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y. While this approach has seen major advances in machine learning, it lacks interpretability of the relationship between prediction and response variables.

This article has been published in 2001, when the word “Big Data” was not yet in everybody’s mouth. But by shaping two different cultures to analyzing data and balancing pros and cons of each approach, it makes the differences of big data analysis in contrast to stochastic data models understandable even to laymen.

Leo Breiman, Statistical Modeling: The Two Cultures. In: Statistical Science, Vol. 16 (2001), No. 3, 199-231. Freely vailable online here.

On assistance and partnership with artificial intelligence

After mobile devices and touchscreens, personal assistants will be “the next big thing” in the tech industry. Amazon’s Alexa, Microsoft’s Cortana, Google Home, Apple’s HomePod  – all these voice-controlled systems come to your home, driven by artificial intelligence, ready for dialogues. Artificial intelligence currently works bests when provided with clear tasks, where clear goals are defined. That can be a goal defined by the user (“Please turn down the music, Alexa!”), but generally the goals of the companies offering personal assistants dominate: They want to sell. And there you find the differences between these personal assistants: Alexa sells the whole range of products marketed by Amazon, Cortana eases access to Microsoft’s soft- and hardware, Google Home has its strengths with smart home devices and the internet of things, and Apple’s HomePod … well, urges you into the hall of mirrors which has been created by Apple’s Genius and other flavour enhancers.

Beyond well-defined tasks, artificial intelligence is bad at chatting and assisting. If you are looking for a partner, for someone to talk to, the predefined goals are missing. AI lacks the world knowledge needed for such a task, nor is it capable to provide for the appropriateness of answers in a conversation or for the similarity of mindsets which is the basis of friendship.

But this is exactly what is promised by the emotion robot “Pepper”. This robot saves the emotions it collects from its human interaction partners on a shared server, a cloud. All of the existing Pepper robots are connected to this cloud. This way the robots, which are already autonomous, collectively “learn” how to improve their emotional reactions. Their developers also work with these data.

If you think through the idea of “Pepper”, you have to ask yourself to which end this robot should serve – as a replacement of a partner for human beings, caring for their emotional well-being? In which way is this conceived of? How does a robot know how to contribute to human well-being? Imagine of a human couple, where he is a choleric, and her role is to constantly quieten him down (contrapuntal approach). Or another couple which is constantly quarrelling, he shouts at her, and she yells back; this couple judges this to be the normal state and their quarrelling as an expression of well-being (homeopathic approach). Can a robot decide which ‘approach’ is the best? Simply imagine what would happen in a scenario where you have a person – we call him ‘Donald’ – who buys a new emotion robot – whom we call ‘Kim’. Certainly this is not the kind of world we’re looking for, isn’t it?

With personal assistants, it seems to be a choice between the devil and the deep blue sea: Either you are being reduced to a consumer; or you’ll be confronted with some strange product without openly defined goals, with which you can’t exchange at eye level. So the best choices we have is to either abstain from using these AIs; or to participate in civil society dialogues with tech companies on policy debates about the use of AI.

No Surprises, Please

As researchers of the social, we are often pre-occupied with ways in which knowledge is governed and controlled in order not to upset hegemonic narratives, but we are reminded every day that anyone can produce knowledge. Inspirational stories of unorthodox investigators and inventors making surprising discoveries abound. They solve problems with one weird trick. Trainers hate them. When particular methods of knowledge creation catch the popular imagination, they stir us to wonder at the achievements of human enquiry and the possibilities of collective endeavour. Citizen-scientists’ efforts to map the universe offer a welcome break in headlines reminding us of humanity’s penchant for self-destruction, as well as evoking a sense of awe at the scale of achievement possible when a critical mass of committed, anonymous volunteers chip away at raw material to carve out a work of staggering complexity.

The allure of stumbling upon a breakthrough that puts experts’ ‘persistent plodding’ (Wang 1963: 93) to shame fosters fervour for emergent tools like the Ngram Viewer, which led Rosenberg to comment that ‘briefly, it seemed that everyone was ngramming’ (2013: 24). When the means of knowledge production are seemingly in the hands of the people, we are tantalised by a fantasy of taking power from those elites who would otherwise govern it, but of course, we are still using the master’s tools. Whether the master is Google – whose Google Books capture about a fifth of the world’s library, and which made a third of those available through the Ngram Viewer – or any other mediator, the citizen-researcher should be wary of the black box, and the weird tricks they conjure from it.

Blue Line
An Ngram

 

Seasoned researchers may consider themselves hyper-aware of dominant discourses, and no-one takes up a position thinking they’ve been duped into the values they hold dear. When virtue-signalling brands like Innocent and Lush cute-bomb us with faux-naïve descriptions of their purity and messages from their workforce of dedicated artisans, we all like to think we can see through their studied informality to the processes of mass production. Borgman (2015) writes of the ‘magic hands’ of specialised, local, expert knowledge production and that which can be replicated on an industrial scale. Both have their place and there is an enduring belief that there are some areas for which the small-scale artisan’s skills are irreplaceable, but which end of the spectrum is most likely to throw up surprises that challenge accepted thinking?

We like to think there is a difference between human, humane craft and computerised, robotic task fulfilment, and we all had a good laugh when that police robot fell down those stairs to its watery grave because we like to think the human perspective adds a special je ne sais quoi beyond the competence of machines. So even where hundreds of artisanal citizen-data-harvesters come together to produce a multi-perspective synth of Venice’s Piazza San Marco, the inherent complexity of this technological mediation cannot be equated with the singular, human perspective of Canaletto’s artistic rendering.

We are wary of the mediation of technology. We therefore allow technology to serve us, to answer the questions we had conceived it to answer, but we are still uncomfortable with the implications of allowing it to suggest new questions. Uricchio points out that journalistic pronouncements on the potentially dystopic applications of new technology have become a trope. The Algorithm, referred to as a synecdoche for various black boxes, evokes a vision of a merciless god to be feared and worshipped:

The recent explosion of headlines where the term ‘algorithm’ figures prominently and often apocalyptically suggests that we are re-enacting a familiar ritual in which ‘new’ technologies appear in the regalia of disruption. But the emerging algorithmic regime is more than ‘just another’ temporarily unruly new technology. (Uricchio, 2017: 125)

Crystal Dome
The Data Deluge

So could the right mix of data and algorithms disrupt our looping endlessly on the same track, elevating us above the Matthew Effect to a higher plain of enlightenment? Is this our era-defining opportunity to emerge from the data deluge with a trove of knowledge, the munificence of knowing exactly where to look? Well possibly, but only if that possibility is already within us, or at least, within those of us creating the algorithms. As Bowker reminds us:

Our knowledge professionals see selfish genes because that’s the way that we look at ourselves as social beings—if the same amount of energy had been applied to the universality of parasitism/symbiosis as has been applied to rampant individualistic analysis, we would see the natural and social worlds very differently. However, scientists tend to get inspired by and garner funding for concepts that sit “naturally” with our views of ourselves. The social, then, is other than the natural and should/must be modeled on it; and yet the natural is always already social. (Bowker 2013: 168)

Uricchio (2017: 126) is also sceptical, noting that the ‘dyad of big data and algorithms can enable new cultural and social forms, or they can be made to reinforce the most egregious aspects of our present social order’; yet he is more hopeful that the computational turn has the power to surprise us:

The new era has yet to be defined, and it is impossible to know how future historians will inscribe our trajectory. Of course, the ‘newness’ of this regime comes with the danger that it will be retrofitted to sustain the excesses and contradictions of the fast-aging modern, to empower particular individual points of view, to control and stabilize a master narrative. But it also offers an opportunity for critical thinking and an imaginative embrace of the era’s new affordances. And for these opportunities to be realized, we need to develop critical perspectives, to develop analytical categories relevant to these developments and our place in them. (Uricchio, 2017: 136)

Our critical capacity is therefore our indemnity against the seduction of surprising discoveries, helping us to judge and accept that which is novel and valid. By interrogating the implications of new areas of inquiry from the start, we can avoid the danger of our creations escaping our control and serving undesirable ends. If we use the computational turn as an opportunity to strengthen critical thought, we might just find our way through the new complexities of knowledge without any nasty surprises.

Frankenstein

 

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. Cambridge, Massachusetts: MIT press.

Bowker, G. (2013). Data Flakes: An Afterword to “Raw Data” Is an Oxymoron. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Rosenberg, D. (2013). Data Before the Fact. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Uricchio, W. (2017). Data, Culture and the Ambivalence of Algorithms. in Schäfer, M.T. and van Es, K. (2017). The Datafied Society: studying culture through data. Amsterdam: Amsterdam University Press.

Wang, H. (1963). Toward Mechanical Mathematics. in The modelling of Mind, ed. K.M. Sayre & F.J. Crosson. South Bend, IN: Notre Dame University Press.

Analogue secrets

Annual leaves and holidays seem to support regressive impulses: Nearly unvoluntarily one stops in front of a souvenir shop to inspect rotating displays with postcards on it. They definitely have a charm of their own – small-sized cardboards with the aura of authenticity, indicating in a special way that “I was here”, as if there weren’t enough selfies and WhatsApp-posts to support that claim. Quickly written and quickly sent, these relicts of a faraway time in which messages were handwritten, postcards seem to provide for a proof that someone has really been far away and happily came back, enriched by experiences of alterity.

This aura of the analogue, which provides a short insight into the inner life of a person distantly on his route, is being exploited by an art project since many years. Postsecret.com provides post office boxes in many countries of the world, where people can send in their postcard, anonymously revealing a personal secret. A simple and ingenious idea: One can utter something which can not be told by word of mouth; nor could it be written down, because one would need an addressee who would draw his conclusions. But Postsecret.com makes this secret public, and maybe the cathartic effect (and affect) to finally got rid of something which was a heavy burden can be enjoyed in anonymity.

Can we conceive of a digital counterpart of this analogue art project? Only if there are still users there who believe in anonymity in the net. But it is true that literally every activity in the net leaves traces – and why should I reveal secrets in the net and thus become in one way or another susceptible to blackmail?

Praise for the good old hand-written postcard of confidence: Every mailing is an original, every self-made postcard is unique.

What’s behind a name?

Could a complete worldwide list of all the names of streets, squares, parks, bridges, etc. be considered as big data? Would the analysis of frequencies and the spatial distribution of these names tell us anything about ourselves?

Such a comparative analysis would miss important information, especially the historical changes of names and the cultural significance embedded therein.

The Ebertstraße in Berlin had changed its name several times: in the 19th century it became the Königgrätzer Straße after the Prussian victory over Austria at the Battle of Königgrätz, during the First World War it was renamed in Budapester Straße, in 1925 it got the name Friedrich-Ebert-Straße in memorial of the first President of the Weimar Republic. Shortly after the Nazi took over in Germany the street was renamed in Hermann-Göring-Straße after the newly elected President of the Reichstag. Only in 1947 the street was finally renamed back to Ebertstraße.

The close-by Mohrenstraße on the other hand bears its name since the beginning of the 18th century. One of the myths on the origin of the street name stems from African musicians who played in the Prussian army. Debates on changing the street name remain and University departments which are located in that street chose to use Møhrenstraße in the meantime.

dav

So, if even if street names are not as rich cultural data as the painting of Mona Lisa, they convey meaning that has been formed, changed and negotiated over a long period of time.

The advantage in dealing with street names and not with maps is that street name data are more reliable than maps which often have been manipulated and distorted for military or other reasons.

But in order to reveal the history of street names one should not restrict oneself to the evidence on, about and of street names but dig into the events, processes, narratives and politics related to the context of origin. The HyperCities project has set up a digital map that allows “thick mapping”.

Certainly such a research will lead to the creation of narratives itself – that might be biased overall – but in the face of historical events is there any objective account possible at all?