Blog

New Wine into Old Wineskins

The communication of scientific outputs or in other words the way narratives relate to data has received much attention in previous KPLEX posts. Questions such as “Do the narratives elaborate on the data, create narrative from the data, or do the narratives reveal the latent richness inherent in the data?” have been raised. These fundamental questions touch upon the very heart of the scientific enterprise. How do we try to grasp the complexity of a phenomenon and how do we translate our insights and findings into clear language?

Debates in anthropology might be revealing in this regard. The “reflexive turn” since the 1970s led anthropologists to ask themselves if it was possible to create an objective study of a culture when their own biases and epistemologies were inherently involved. So far they had produced some kind of “realist tale” with a focus on the regular and the junction of observations with standard categories. Only those data had been allowed that supported the analysis; the underanalyzed or problematic had been left out. Anthropologists’ worries and efforts had revolved around the possible criticism that their work was “’opinion, not fact’, ‘taste, not logic’, or ‘art, not science’” (Van Maanen 1988: 68). Then a plentitude of new tales emerged that openly discussed the accuracy, breadth, typicality, or generality of the own cultural representations and interpretations. Put under headings like “confessional tale”, “impressionist tale”, “critical tale”, “literary tale” or “jointly told tale” all these kinds of narratives open up new ways of description and explanation. Issues such as serendipity, errors, misgivings and limiting research roles for example are taken up by confessional writers (Karlan and Appel 2016).

How far has this discussion progressed in the sciences? A differentiation analogue to the one between “real events (Geschichte) and the narrative strategies (Historie) used to represent, capture, communicate, and render these events meaningful” has to some extent taken place. Still the process of constructing scientific facts often happens in a black box and is not revealed to the reader, as the famous study by Bruno Latour and Steve Woolgar has shown. Like in the humanities also in the sciences different kinds of statements – ranging from taken-for granted facts through tentative suggestions and claims to conjectures – contribute to the establishment of ”truth”. The combination of researchers, machines, “inscription devices”, skills, routines, research programs, etc. leads to the “stabilisation” of statements: “At the point of stabilisation, there appears to be both objects and statements about these objects. Before long, more and more reality is attributed to the object and less and less to the statement about the object. Consequently, an inversion take place: the object becomes the reason why the statement was formulated in the first place. At the onset of stabilisation, the object was the virtual image of the statement; subsequently, the statement becomes the mirror image of the reality ‘out there’” (Latour and Woolgar 1979: 176 f.).

New wine in old skins

So with regard to the sciences the questions that had been raised before: “Is it possible for a narrative to become data-heavy or data-saturated? Does this impede the narrative from being narrative?” have to be negated. Discursive representations are always implicated when conveying some version of truth. In terms of reflexivity there is still some room for improvement, e.g. putting the focus not only on the communication of startling facts but also on non-significant results. This would certainly help the practitioners of science to get to know better the scope and explanatory power of their disciplinary methods and theories. Hope remains that unlike in the parable of new wine in old wineskins the sciences will stand these changes and not burst.


Karlan, Dean S./ Appel, Jacob (2016) Failing in the field: What we can learn when field research goes wrong. Princeton: Princeton University Press.

Latour, Bruno/ Woolgar, Steve (1979) Laboratory Life: The Construction of Scientific Facts. Princeton: Princeton University Press.

Van Maanen (1988) Tales of the Field: On Writing Ethnography. Chicago: The University of Chicago Press.

Featured image was taken from: https://i.pinimg.com/originals/22/bc/0a/22bc0a8c4573701d9df70f51a971388a.jpg

Statistical Modeling: The Two Cultures

In this article Leo Breiman describes two approaches in statistics: One assumes that the data are generated by a given stochastic data model. The other uses algorithmic models and treats the data mechanism as unknown.

Statisticians in applied research consider data modeling as the template for statistical analysis and focus within their range of multivariate analysis tools on discriminant analysis and logistic regression in classification and multiple linear regression in regression. This approach has the plus that it produces a simple and understandable picture of the relationship between the input variables and response. But the assumption that the data model is an emulation of nature is not necessarily right and can lead to wrong conclusions.

The algorithmic approach uses neural nets and decision trees; predictive accuracy as criterion to judge the quality of the results of analysis. This approach does not apply data models to explain the relationship between input variable x and output variable y, but treats this relationship as a black box. Hence the focus is on finding an algorithm f(x) such that for future x in a test set, f(x) will be a good predictor of y. While this approach has seen major advances in machine learning, it lacks interpretability of the relationship between prediction and response variables.

This article has been published in 2001, when the word “Big Data” was not yet in everybody’s mouth. But by shaping two different cultures to analyzing data and balancing pros and cons of each approach, it makes the differences of big data analysis in contrast to stochastic data models understandable even to laymen.

Leo Breiman, Statistical Modeling: The Two Cultures. In: Statistical Science, Vol. 16 (2001), No. 3, 199-231. Freely vailable online here.

On assistance and partnership with artificial intelligence

After mobile devices and touchscreens, personal assistants will be “the next big thing” in the tech industry. Amazon’s Alexa, Microsoft’s Cortana, Google Home, Apple’s HomePod  – all these voice-controlled systems come to your home, driven by artificial intelligence, ready for dialogues. Artificial intelligence currently works bests when provided with clear tasks, where clear goals are defined. That can be a goal defined by the user (“Please turn down the music, Alexa!”), but generally the goals of the companies offering personal assistants dominate: They want to sell. And there you find the differences between these personal assistants: Alexa sells the whole range of products marketed by Amazon, Cortana eases access to Microsoft’s soft- and hardware, Google Home has its strengths with smart home devices and the internet of things, and Apple’s HomePod … well, urges you into the hall of mirrors which has been created by Apple’s Genius and other flavour enhancers.

Beyond well-defined tasks, artificial intelligence is bad at chatting and assisting. If you are looking for a partner, for someone to talk to, the predefined goals are missing. AI lacks the world knowledge needed for such a task, nor is it capable to provide for the appropriateness of answers in a conversation or for the similarity of mindsets which is the basis of friendship.

But this is exactly what is promised by the emotion robot “Pepper”. This robot saves the emotions it collects from its human interaction partners on a shared server, a cloud. All of the existing Pepper robots are connected to this cloud. This way the robots, which are already autonomous, collectively “learn” how to improve their emotional reactions. Their developers also work with these data.

If you think through the idea of “Pepper”, you have to ask yourself to which end this robot should serve – as a replacement of a partner for human beings, caring for their emotional well-being? In which way is this conceived of? How does a robot know how to contribute to human well-being? Imagine of a human couple, where he is a choleric, and her role is to constantly quieten him down (contrapuntal approach). Or another couple which is constantly quarrelling, he shouts at her, and she yells back; this couple judges this to be the normal state and their quarrelling as an expression of well-being (homeopathic approach). Can a robot decide which ‘approach’ is the best? Simply imagine what would happen in a scenario where you have a person – we call him ‘Donald’ – who buys a new emotion robot – whom we call ‘Kim’. Certainly this is not the kind of world we’re looking for, isn’t it?

With personal assistants, it seems to be a choice between the devil and the deep blue sea: Either you are being reduced to a consumer; or you’ll be confronted with some strange product without openly defined goals, with which you can’t exchange at eye level. So the best choices we have is to either abstain from using these AIs; or to participate in civil society dialogues with tech companies on policy debates about the use of AI.

No Surprises, Please

As researchers of the social, we are often pre-occupied with ways in which knowledge is governed and controlled in order not to upset hegemonic narratives, but we are reminded every day that anyone can produce knowledge. Inspirational stories of unorthodox investigators and inventors making surprising discoveries abound. They solve problems with one weird trick. Trainers hate them. When particular methods of knowledge creation catch the popular imagination, they stir us to wonder at the achievements of human enquiry and the possibilities of collective endeavour. Citizen-scientists’ efforts to map the universe offer a welcome break in headlines reminding us of humanity’s penchant for self-destruction, as well as evoking a sense of awe at the scale of achievement possible when a critical mass of committed, anonymous volunteers chip away at raw material to carve out a work of staggering complexity.

The allure of stumbling upon a breakthrough that puts experts’ ‘persistent plodding’ (Wang 1963: 93) to shame fosters fervour for emergent tools like the Ngram Viewer, which led Rosenberg to comment that ‘briefly, it seemed that everyone was ngramming’ (2013: 24). When the means of knowledge production are seemingly in the hands of the people, we are tantalised by a fantasy of taking power from those elites who would otherwise govern it, but of course, we are still using the master’s tools. Whether the master is Google – whose Google Books capture about a fifth of the world’s library, and which made a third of those available through the Ngram Viewer – or any other mediator, the citizen-researcher should be wary of the black box, and the weird tricks they conjure from it.

Blue Line
An Ngram

 

Seasoned researchers may consider themselves hyper-aware of dominant discourses, and no-one takes up a position thinking they’ve been duped into the values they hold dear. When virtue-signalling brands like Innocent and Lush cute-bomb us with faux-naïve descriptions of their purity and messages from their workforce of dedicated artisans, we all like to think we can see through their studied informality to the processes of mass production. Borgman (2015) writes of the ‘magic hands’ of specialised, local, expert knowledge production and that which can be replicated on an industrial scale. Both have their place and there is an enduring belief that there are some areas for which the small-scale artisan’s skills are irreplaceable, but which end of the spectrum is most likely to throw up surprises that challenge accepted thinking?

We like to think there is a difference between human, humane craft and computerised, robotic task fulfilment, and we all had a good laugh when that police robot fell down those stairs to its watery grave because we like to think the human perspective adds a special je ne sais quoi beyond the competence of machines. So even where hundreds of artisanal citizen-data-harvesters come together to produce a multi-perspective synth of Venice’s Piazza San Marco, the inherent complexity of this technological mediation cannot be equated with the singular, human perspective of Canaletto’s artistic rendering.

We are wary of the mediation of technology. We therefore allow technology to serve us, to answer the questions we had conceived it to answer, but we are still uncomfortable with the implications of allowing it to suggest new questions. Uricchio points out that journalistic pronouncements on the potentially dystopic applications of new technology have become a trope. The Algorithm, referred to as a synecdoche for various black boxes, evokes a vision of a merciless god to be feared and worshipped:

The recent explosion of headlines where the term ‘algorithm’ figures prominently and often apocalyptically suggests that we are re-enacting a familiar ritual in which ‘new’ technologies appear in the regalia of disruption. But the emerging algorithmic regime is more than ‘just another’ temporarily unruly new technology. (Uricchio, 2017: 125)

Crystal Dome
The Data Deluge

So could the right mix of data and algorithms disrupt our looping endlessly on the same track, elevating us above the Matthew Effect to a higher plain of enlightenment? Is this our era-defining opportunity to emerge from the data deluge with a trove of knowledge, the munificence of knowing exactly where to look? Well possibly, but only if that possibility is already within us, or at least, within those of us creating the algorithms. As Bowker reminds us:

Our knowledge professionals see selfish genes because that’s the way that we look at ourselves as social beings—if the same amount of energy had been applied to the universality of parasitism/symbiosis as has been applied to rampant individualistic analysis, we would see the natural and social worlds very differently. However, scientists tend to get inspired by and garner funding for concepts that sit “naturally” with our views of ourselves. The social, then, is other than the natural and should/must be modeled on it; and yet the natural is always already social. (Bowker 2013: 168)

Uricchio (2017: 126) is also sceptical, noting that the ‘dyad of big data and algorithms can enable new cultural and social forms, or they can be made to reinforce the most egregious aspects of our present social order’; yet he is more hopeful that the computational turn has the power to surprise us:

The new era has yet to be defined, and it is impossible to know how future historians will inscribe our trajectory. Of course, the ‘newness’ of this regime comes with the danger that it will be retrofitted to sustain the excesses and contradictions of the fast-aging modern, to empower particular individual points of view, to control and stabilize a master narrative. But it also offers an opportunity for critical thinking and an imaginative embrace of the era’s new affordances. And for these opportunities to be realized, we need to develop critical perspectives, to develop analytical categories relevant to these developments and our place in them. (Uricchio, 2017: 136)

Our critical capacity is therefore our indemnity against the seduction of surprising discoveries, helping us to judge and accept that which is novel and valid. By interrogating the implications of new areas of inquiry from the start, we can avoid the danger of our creations escaping our control and serving undesirable ends. If we use the computational turn as an opportunity to strengthen critical thought, we might just find our way through the new complexities of knowledge without any nasty surprises.

Frankenstein

 

Borgman, C. L. (2015). Big data, little data, no data: Scholarship in the networked world. Cambridge, Massachusetts: MIT press.

Bowker, G. (2013). Data Flakes: An Afterword to “Raw Data” Is an Oxymoron. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Rosenberg, D. (2013). Data Before the Fact. in Gitelman, L. (2013). “Raw Data” Is an Oxymoron. Cambridge, Massachusetts: MIT press.

Uricchio, W. (2017). Data, Culture and the Ambivalence of Algorithms. in Schäfer, M.T. and van Es, K. (2017). The Datafied Society: studying culture through data. Amsterdam: Amsterdam University Press.

Wang, H. (1963). Toward Mechanical Mathematics. in The modelling of Mind, ed. K.M. Sayre & F.J. Crosson. South Bend, IN: Notre Dame University Press.

Analogue secrets

Annual leaves and holidays seem to support regressive impulses: Nearly unvoluntarily one stops in front of a souvenir shop to inspect rotating displays with postcards on it. They definitely have a charm of their own – small-sized cardboards with the aura of authenticity, indicating in a special way that “I was here”, as if there weren’t enough selfies and WhatsApp-posts to support that claim. Quickly written and quickly sent, these relicts of a faraway time in which messages were handwritten, postcards seem to provide for a proof that someone has really been far away and happily came back, enriched by experiences of alterity.

This aura of the analogue, which provides a short insight into the inner life of a person distantly on his route, is being exploited by an art project since many years. Postsecret.com provides post office boxes in many countries of the world, where people can send in their postcard, anonymously revealing a personal secret. A simple and ingenious idea: One can utter something which can not be told by word of mouth; nor could it be written down, because one would need an addressee who would draw his conclusions. But Postsecret.com makes this secret public, and maybe the cathartic effect (and affect) to finally got rid of something which was a heavy burden can be enjoyed in anonymity.

Can we conceive of a digital counterpart of this analogue art project? Only if there are still users there who believe in anonymity in the net. But it is true that literally every activity in the net leaves traces – and why should I reveal secrets in the net and thus become in one way or another susceptible to blackmail?

Praise for the good old hand-written postcard of confidence: Every mailing is an original, every self-made postcard is unique.

What’s behind a name?

Could a complete worldwide list of all the names of streets, squares, parks, bridges, etc. be considered as big data? Would the analysis of frequencies and the spatial distribution of these names tell us anything about ourselves?

Such a comparative analysis would miss important information, especially the historical changes of names and the cultural significance embedded therein.

The Ebertstraße in Berlin had changed its name several times: in the 19th century it became the Königgrätzer Straße after the Prussian victory over Austria at the Battle of Königgrätz, during the First World War it was renamed in Budapester Straße, in 1925 it got the name Friedrich-Ebert-Straße in memorial of the first President of the Weimar Republic. Shortly after the Nazi took over in Germany the street was renamed in Hermann-Göring-Straße after the newly elected President of the Reichstag. Only in 1947 the street was finally renamed back to Ebertstraße.

The close-by Mohrenstraße on the other hand bears its name since the beginning of the 18th century. One of the myths on the origin of the street name stems from African musicians who played in the Prussian army. Debates on changing the street name remain and University departments which are located in that street chose to use Møhrenstraße in the meantime.

dav

So, if even if street names are not as rich cultural data as the painting of Mona Lisa, they convey meaning that has been formed, changed and negotiated over a long period of time.

The advantage in dealing with street names and not with maps is that street name data are more reliable than maps which often have been manipulated and distorted for military or other reasons.

But in order to reveal the history of street names one should not restrict oneself to the evidence on, about and of street names but dig into the events, processes, narratives and politics related to the context of origin. The HyperCities project has set up a digital map that allows “thick mapping”.

Certainly such a research will lead to the creation of narratives itself – that might be biased overall – but in the face of historical events is there any objective account possible at all?

The Dream of Knowing the Future

Predicting the future has always been a concern for positivist scientists. The theories and models they constructed claimed not only to represent general laws but also to forecast the prospective outcomes of long-term processes. Take for example the predecessor of sociology, Auguste Comte, who tried to explain the past development of humanity and predict its future course. Even Karl Marx, who criticized the limited conception of cause underlying positivist natural laws, developed a theory of history for Western Europe that saw socialism and communism as the final stages after epochs of slavery, feudalism and capitalism. In addition to description and explanation the predictive power was and is crucial for the scientific worth of theories.

Predictions about the future often rely on a combination of historical data and interpretations informed by current theories. A review of 65 estimates of how many people the earth can support for instance show how widely these differ: “The estimates have varied from <1 billion to >1000 billion”. The review also shows the different methods used for estimating human carrying capacity. The first estimation in the 17th century which stated 13.4 billion people extrapolated the number of inhabitants in Holland to the earth’s inhabited land area. Estimations from the 20th century are based on food and water supply and individual requirements thereof.

Recent estimations rely on computer models that integrate data and theories related to growth. Different scenarios are developed that estimate the maximum global population at about 9 billion people in the 21st century and then either to collapse or to adapt smoothly to the carrying capacity of the earth.

Population-Projections

Estimations of this kind should be viewed with caution, because the information provided is incomplete. We might have some idea about the desirable level of material well-being and the physical environments we want to live in, but we cannot foresee the technologies, economic arrangements or the political institutions in place in fifty or eighty years. These mechanisms do not operate independently but interact and produce feedback loops. The awareness of dangers and risks alone won’t necessarily change predominant policies. Human behavior and the underlying fashions, tastes and values (on family size, equality, stability and sustainability) are too complex to be predicted accurately.

Let’s try then a more modest example! What about predicting the potential outbreak of a disease? Google Flu Trends was a program that aimed for better influenza forecasting than the U.S. Centers for Disease Control and Prevention. From 2008 onwards internet searches for information on symptoms, stages and remedies were analyzed in order to predict where and how severely the flu would strike next. The program failed. Big data inconsistencies and human errors in interpreting the data are held responsible for not predicting the flu outbreak in the United States in 2013, the worst outbreak of influenza in ten years. Another recent example is the Ebola epidemic in West Africa in 2014. The U.S. Centers for Disease Control and Prevention published a worst-case prediction with 1.4 million people infected. The World Health Organization predicted a 90% death rate from the disease, in retrospect the rate is about 70%. The data and the model based on initial outbreak conditions turned out inadequate for projections. Disease conditions and human behavior changed too quickly for humans and algorithms to keep up.

OK, then how about sales forecasting, a comparatively easy task? Mass-scale historical data has served eBay and other companies to measure the benefit of search advertising. In a simple predictive model clicks were counted to predict sales: “Although a click on an eBay ad was a strong predictor of a sale – consumers typically purchased right after clicking – the experiment revealed that a click did not have nearly as large a causal effect, because the consumers who clicked were likely to purchase, anyway”.

This shows us that data alone are not enough for prediction, one needs to know about causal effects and context information. Additionally, purely data-driven approaches tend to produce models and algorithms that are overfit to the idiosyncrasies of particular circumstances. What theories and models can deliver is not knowledge of the future but at best the ability to rule out a range of futures as unrealistic.


Featured image was taken from: http://www.bigpicexplorer.com/idealworld/population.htm

Patina pleases

Everything new is blank. New things display their integrity and an undestroyed, immaculate surface. Design objects are icons of youth and timelessness. They demand attention in a special way, because their newness is always at risk. Traces like scratches rob them of their stainlessness and indicate a loss of aura. The ageing of these objects triggers a certain horror in their owners, because it shows the passing of newness and a loss of control. Consumer electronics is a particularly instructive example here, and this is in spite of a miraculous ability of electronic content – the ability NOT to age.

But ageing is precious. Aged objects, which can be found on flea markets, in museums or antique shops are often seen as precious. They represent the past, time, and history, and we appreciate the peculiarities of their surface, which has grown over decades. We are not disturbed by dust, grease, grime, wax, scratches, or cracks – on the contrary, the patina of objects represents their depth and their ability to exhibit their own ageing process. The attention of the observer focuses on the materiality of the object, the patina of the surface and the deformations gained in time mark their singularity and individuality. To be precise: The fascination of the observer focuses more on the signs of ageing, rather than on the object itself.

schoolbag

Schoolbag from a flu market

In the digital world, we don’t find comparable qualities. Ageing would mean that data have become corrupt, unreadable, unusable and therefore worthless. For objects digitized by cultural heritage institutions, this is a catastrophe; it means loss. With consumer electronics, it’s similar: A smartphone is devalued by scratches. No gain in singularity is noted. With software, it is even worse. Software ageing is a phenomenon, where the software itself keeps its integrity and functionality, but if the environment, in which this software works, or the operating system changes, the software will create problems and mistakes, and sooner or later it will become dysfunctional. Ageing here means an incapability to adapt to the digital environment, and surprisingly this happens without wear or deterioration, since it is not data corruption which causes this ageing. In this respect, the process of software ageing can be compared to the ageing of human beings: They drag behind time or are uncoupled of the social world they live in; they lose connectivity, the ability of actualisation, and the skill to exchange with their environment.

With analogue objects, this is not the case. They provoke sensual pleasures without reminding the observer of the negative aspects of human ageing. Even if they have become dysfunctional and useless, they keep the dignity and aura of time, inscribed into their body and surface.  They keep their observers at a beneficial distance, which opens up space for imagination and empathy. The observer is free to visualise to himself the history of these objects and their capability to endure long time distances without vanishing – certainly a faculty which human beings do not dispose of. What remains are the characteristics of dignified ageing. While the nasty implications of ageing are buried in oblivion, analogue objects evoke a beauty of ageing.

Veracity and Value: Two more “V” of Big Data

So far we have learnt about the most popular three criteria of big data: volume, velocity and variety. Jennifer Edmond suggested adding voluptuousness as fourth criteria of (cultural) big data.

I will now discuss two more “V” of big data that are often mentioned: veracity and value. Veracity refers to source reliability, information credibility and content validity. In the book chapter “Data before the Fact” Daniel Rosenberg (2013: 37) argued: “Data has no truth. Even today, when we speak of data, we make no assumptions at all about veracity”. Many other scholars agree with this, see: Data before the (alternative) facts.

What has been questioned for “ordinary” data seems to hold true for big data. Is this because big data is thought to comprise statistical population data, not just data of a sample? Does the assumed totality of data reveal the previously hidden truth? Instead of relying on a model or on probability distributions, we could now assess and analyse data of the entire population. But apart from the implications for statistical analysis (higher chances of getting false-positives, need for tight statistical significance levels, etc.) there are even more fundamental problems with the veracity of big data.

 

emoji

Take the case of Facebook emoji reactions. They have been introduced in February 2016 to give users the opportunity to react to a post by tapping either Like, Love, Haha, Wow, Sad or Angry. Not only is the choice of affective states very limited and the expression of mixed emotions impossible but the ambiguity in using these expressions themselves is problematic. Although Facebook reminds its users: “It’s important to use Reactions in the way it was originally intended for Facebook — as a quick and easy way to express how you feel. […] Don’t associate a Reaction with something that doesn’t match its emotional intent (ex: ‘choose angry if you like the cute kitten’)”, we do know that human perceptions as well as motives and ways of acting and reacting are manifold. Emojis can be used to add emotional or situational meaning, to adjust tone, to make a message more engaging to the recipient, to manage conversations or to maintain relationships. Social and linguistic function of emojis are complex and varied. Big Data in the case of Facebook emoji reactions then seems to be as pre-factual and rhetorical as “ordinary” data.

Value now refers to social and economic value that big data might create. When reading documents like the European Big Data Value Strategic Research Innovation Agenda one gets the impression of economic value dominating. The focus is directed to “fuelling innovation, driving new business models, and supporting increased productivity and competitiveness”, “increase business opportunities through business intelligence and analytics” as well as to the “creation of value from Big Data for increased productivity, optimised production, more efficient logistics”. Big Data value is not speculative anymore: “Data-driven paradigms will emerge where information is proactively extracted through data discovery techniques and systems are anticipating the user’s information needs. […] Content and information will find organisations and consumers, rather than vice versa, with a seamless content experience”.

Facebook emoji reactions are just an example of this trend. Analysing users’ reactions allows not only for “better filter the News Feed to show more things that Wow us” but probably also to change consumer behavior and sell individualized products and services.


Featured image was taken from Flickr.