Data practice, data science

Big Data has been attracting growing professional interest, and this has sparked intellectual curiosity and a sense of hope for exciting research within STS. In this piece I am surveying the discussions about data science and data practices at the recent EASST conference in Barcelona from the perspective of the opportunity for building a research practice at the confluence of STS and digital data. Some propose to bring the STS practices of research and analysis to data scienceby means of bringing either leadership or critique to the field. Others have taken the alternative path of embracing digital practices and taking up digital tools and programming for pursuing an STS agenda by other means.

In my account of this years’ EASST conference in Barcelona I would like to focus on STS studies of data practices, and the different perspectives I encountered at the conference with respect to how STS may engage with the professional worlds of digital data. I obtained my Phd in Human-Centered Computing in 2014 in the US, where I studied professional knowledge in the making of software. After my PhD, I returned to Europe, and I have been thinking of EASST conferences as opportunities for finding my way into the academic community in Europe. Now I came to the conference from Hungary with the financial support kindly provided by EASST, for which I feel honored and grateful. I was presenting my postdoctoral research at ITU about digital methods. My recent academic path has involved a lot of wayfinding and criss-crossing between places, countries, social worlds and their concerns, and the issue of finding my way into the professional worlds of digital data as a social scientist was most acute for me as I arrived at the conference.

With all the talk and interest in big data and data science, there is a growing sense of social build-up, and I feel that I share the sentiment with other STS scholars that it would be hard to circumvent all this commotion without intellectual curiosity and a sense of hope for exciting research. The social sciences have been taken up in a movement where objective accounts by impartial onlookers at the sidelines has been giving way to the involved and perspectival accounts of the participant, and I could sense a corresponding eagerness to be part of the digital data game. At the same time, the discussions also made it clear that these positions are in the midst of being explored by STS practitioners. If digital data presents itself as an opportunity (to play on a different metaphorical register which is more akin to the field itself), it is equally a challenge to find out how we can dwell in social science and digital data at the same time. This challenge has a reflexive edge to it insofar as our understanding of the constitution of these new domains plays into the STS position that we seek to outline from within. Big data and data science are emerging at the confluence of the knowledge work of data analysis and digital technology, and I would like to argue that significantly different epistemic positions are outlined depending on whether the digital character of data practices are given emphasis.

 

Fig. 1: A critical making hackathon by Gabby Resch at the University of Toronto exploring the quantification of toilets by means of behavioral and residue data
Courtesy of University of Toronto, Faculty of Information

 

My discussion draws from two panels, a roundtable session on ‘The Potential Futures of Data Science’ taking place at the very beginning of the conference, and the three-part track entitled ‘Critical data studies’ on the last day. The data science roundtable was hosted by Brian Beaton from CalPoly, and repeated a similar arrangement held with the same scholars at the 4S conference in Denver last year. It attracted a surprisingly large audience, who were also willing to cheerfully chip in with their considered opinions despite the early morning hour. The three-part track broadened the theme from data science to computational data practices at large, while big data was casting its shadow over both of the venues. The contributions on the last day were for the most part case studies of professional work practices around digital data, which provided the empirical fodder for a slower-paced discussion.

Overall, the discussions and presentations were convincing that there is a broad sweep of STS research about new professional practices around data. The empirical work presented on the last day was especially diverse, looking at among others visualization practices in elementary particle physics, modeling practices for informing policy among economists, algorithmic sense-making among data scientists, the use of data as evidence in health care, or curating large-scale databases across cultural institutions. Diversity within the field was discussed by several contributors, who pointed to a divide between academia and industry (David Ribes), a distinction between emerging practices of social data and the historical continuities in the natural sciences (Paul Edwards), and differences between large and small scale data practices within the latter (Irene Pasquetto and Ashley E. Sands). It is also clear that our research implies partaking of different professional settings and communities beyond the fields we study, for example in STS, policy and in education.

In the face of this diversity, my own question of wayfinding became translated to the problem of unity and relevance: what brings us together and with whom when we apply the STS lens to professional data practices?

I would like to start with the hype that characterizes big data and data science. These labels were adopted as unifying themes for the track and the roundtable, respectively, while participants also acknowledged that in talking about these areas, we are dealing with moving targets, open-ended signifiers which are driven by evangelism, boosterism or veiled financial and political interests. One approach was to render STS itself into a formative agent within this arena. Brian Beaton proposed, somewhat provocatively, to think about what a takeover of data science by STS would be like. He used the witty argument that (I paraphrase) we have been here for longer, and we have all the right tools for making sense of social practice. I understood him to mean that Big data and data science are surprisingly new developments, which are seeking to make sense of their own position in the scientific arena. STS has been working on making sense of exactly these kinds of situations, and we have developed considerable expertise in this While the fantasy of such a takeover deeply resonates with some part of my intellectual self, I jotted down the immediate reaction in my notes that this would not possible because digital data is already entangled in large-scale institutional contexts, which, together with technologies like databases and tools of analysis, create a powerful regime of practices. While gaining professional agency has enormous appeal, and it resonates with the call for doing STS by other means, we should be wary of a wholesale adoption of these open signifiers as the heuristic framing of research. In this regard, I particularly appreciated Andrew Clement’s short intervention that (and I am paraphrasing again) the emergence of data science is driven by those who seek control without a clear idea of how control may be achieved, and they are soliciting the help from a new cast of professionals, the data scientists, to make sense of data for this purpose.

Meanwhile, I also encountered examples of doing STS by other means which were exploring new avenues for understanding the role of STS within the digital data domain. I like to think of these approaches as qualified versions of insiderism, because they share with digital professionals the orientation to making, but this is pursued within an STS framing. Another way of characterizing them is to say that they appropriate the nitty-gritty of technological work practices around digital data for an STS agenda, engaging in some sort of a take-over of digital practices. An emphasis on the digital character of data practices comes to the fore, and this lends these positions a distinct epistemic character. I would like to report about two approaches which have been making a strong impression on me on account of practicing this silent, everyday form of take-over from within, the critical information practice of Yanni Loukissas, Matt Ratto and Gabby Resch, and the STS-take on digital data analysis that was brought to this conference by Tommaso Venturini, Anders Kristian Munk and Mathieu Jacomy. It was Resch and Venturini who talked about the respective approaches.

Critical data practice is a curriculum that has been developed to engage students in practice-based reflection around data. Paraphrasing Gabby Resch, critical data practice means that participants do actual data science with current digital tools, such as MapReduce and Pandas, but they also do Derrida and think about Derrida’s discussion of the archive. Data often comes to data science as a given, in the form of a database, and the authors have organized digital workshops which tackle this assumption and put in focus the making of data and databases. In these workshops, students are called on to invent their own apparatuses for data collection, they clean and aggregate the data and they are invited to reflect on the tactics they use in this process for making data regular.

 

Fig. 2: Working with network visualizations at a data sprint in Oxford
Courtesy of Tommaso Venturini

 

Venturini talked about how researchers in STS picked up the method of social network analysis and came to grapple with its limitations for pursuing STS questions. ANT proposes for example that networks become actors, and this would require a mode of analysis where node and network are reversible. Network analysis has no ready-made models and tools that could support such a reversible approach. In the face of this and other limitations, Venturini and his colleagues have outlined a research agenda for visual network analysis, which appropriates the computational apparatus for visualizing networks towards STS ends. One example is the ForceAtlas2 algorithm and its implementation in the open source network visualization tool Gephi. This algorithm makes social features like clustering and density more salient in network visualizations. In visual network analysis, advancing the STS agenda becomes possible through partnering with computers and engaging in the nitty-gritty of software development.

Venturini and Rasch have shown a path where STS appropriates digital data practice for its own theoretical and critical agenda. It is a path for doing STS by other means. This is in stark contrast with the approach which would bring the empirical and theoretical STS toolkit to enlighten or critique the agenda of data science. In fact, critical data practice and visual network analysis participate in figuring out digital data and giving a face and a name to it each in its own way. In this, they are similar to the scientists and professionals in the STS case studies presented at the conference. Their data practices are in sync with their work practices, which are varied and local. If we can talk about unity, it is at the level of digital practice.

I find that there is something powerful in the proposition to embrace digital practice for doing STS. It feels like a much awaited opportunity to do social science by other means, and it appeals to the ethnographer’s mandate to turn into an insider without entirely going over to the other side.