A few thoughts about data journalism from the Mozfest 2015

A few days ago I attended my very first Mozfest. Now I understand why survival guides have been written for this event. It’s so huge and busy you can easily get lost and overwhelmed! Just as I did with the Daten Labor conference a few weeks ago, I want to write down some interesting observations here.

The importance of programming languages and computational thinking in data journalism Link to heading

I talked with a lot of coding journalists and technologists working in journalism. What I found striking was the amount of self-taught programmers. For many, the journey into data journalism began with a personal interest in programming. This made me thinking: When we talk about data journalism we talk a lot about, well, data. But I never read about the importance of high-level programming languages like Python or Ruby and their frameworks and libraries to make programming relatively easy to learn and fast. It seems hard to imagine data and programming would play such a big role in journalism today without them. It also became apparent that Stack Overflow does contribute a lot to this development. Out of curiosity I asked several coding journalists: What would happen if Stack Overflow was suddenly gone? The answers ranged from “my productivity would drop considerably” to “I don’t want to think about it”. It seems that the ubiquitous availability of data alone would not have been enough to make data journalism such a big deal. Only in combination with developments that made programming relatively easy to learn and compatible with journalistic workflows data journalism could become more widespread.

In relation to the role of modern programming languages, something that occurred to me both at the Mozfest and the Daten Labor conference was the importance of computational thinking for data journalism. Computational thinking means “formulating problems and their solutions so that the solutions are represented in a form that can be effectively carried out by an information-processing agent” (Wing 2010). It involves abstraction, modularization, “and automation via algorithms to enable scale” (Diakopoulos 2016). At the Daten Labor conference, a journalist told me that the most significant change in her newsroom was not so much working with data but the automation of repetitive tasks. At Mozfest, Phillip Smith, who has experience with teaching journalists how to code, told me that the first step for many aspiring data journalists is to think about which parts of their work could be automated to get a grasp of how coding is useful for them. Without the ability to think computationally, it is difficult for journalists to integrate working with data into their workflows. Some researchers in Journalism Studies like to distinguish between data journalism and ‘computational journalism’ (e.g. Coddington 2015), but I don’t think this distinction holds in practice – computational thinking easily leads to working with data, while data journalism without computational thinking seems very limited. Maybe we should talk less about a ‘data revolution’ and more about a revolution in programming affordances and computational thought?

Transparency in data journalism: A luxury? Link to heading

As on the Daten Labor conference, the transparency and reproducibility of data journalism and science was an important theme. There was a session about Jupyter notebooks which was attended by journalists, technologists, and scientists. There was also one about the difficulties and pitfalls of working with data in journalism. For scientists, transparency and reproducibility is (or should!) be a natural concern to ensure the validity of the research. Journalists want to ensure that their claims are true as well, but the relationship to transparency and reproducibility seems to be a bit less straightforward. There was an interesting comment by Sisi Wei of ProPublica at a panel discussion: the number one fear of every journalist is to have to correct something, number two is getting scooped. The approach of ProPublica is to be very closed before publication and as open as possible afterwards. But to cope with fear number one, a lot of work has to be invested to make sure there are no errors. Moreover, the data and the code needs to be published with sufficient documentation to prevent misuse and to ensure others can utilize it. I think there is a danger here that transparency ends up only being practiced by larger newsrooms with more resources, while smaller ones rather ‘play it safe’ due to fear number one. However, Sisi also suggested that the openness of larger investigative newsrooms like ProPublica allows local journalists to use the available data and tools to create stories for their own areas. This could help to mitigate the issue. Another help could come from hyperlocal websites and civic tech applications provided by NGOs such as mySociety. More routines and established guidelines when it comes to data reporting might also help. Still, it seems we are far from having transparency and reproducibility as norm in data journalism.

References Link to heading

Coddington, Mark. 2015. ‘Clarifying Journalism’s Quantitative Turn’. Digital Journalism 3 (3): 331–48. https://doi.org/10.1080/21670811.2014.976400.

Diakopoulos, Nicholas. 2016. ‘Computational Journalism and the Emergence of News Platforms’. In The Routledge Companion to Digital Journalism Studies, edited by Scott Eldridge II and Bob Franklin. https://citeseerx.ist.psu.edu/document?repid=rep1&type=pdf&doi=b401e0517826465d88212a4e38c42e51f6679479.

Wing, Jeannette M. 2010. ‘Computational Thinking: What and Why?’ Pittsburgh. https://www.cs.cmu.edu/~CompThink/resources/TheLinkWing.pdf