Daten Labor 2015: Data journalism and science26 Oct 2015
Last week I visited the Daten Labor conference in Germany, which brought together data journalists and scientists and offered workshops. It was a great opportunity to meet both established and aspiring data journalists and to get an impression of the ‘state of the art’ in Germany. It’s impossible to do justice to this conference in just one blogpost, but I’ll try to collect some interesting observations here.
1. Data journalists just want to be journalists
A discussion panel about how data journalism and science could profit from each other showed that scientists and data journalists have different views on what data journalism is or should be about. Some of the scientists in the panel were hoping that data journalists could help to explain the meaning and the consequences of big data technologies to the public and thereby hold Google accountable (I mention Google here because Google – and not governments – was the example used in the discussion). In other words: data journalists should be involved in what Nicholas Diakopoulos has called algorithmic accountability, the investigation of algorithms used by companies like Google or governments. This implies that data journalists and scientists should work closely together and that data journalists’ primary task is communicating the insights to the public. Recognizing that journalists and scientists work in different time scales and have different constraints, scientists should provide journalists with tools that help them to profit from scientific insights during their investigations.
Data journalists were skeptical about these ideas and engaged in some boundary work. They don’t see themselves as ‘Algorithmists’, the term Mayer-Schönberger and Cukier (2013, 180–82) use to describe those who should review algorithms to prevent abuse. They seem to see data journalism primarily as a continuation of traditional journalistic work practices, i.e. as a set of new techniques to find and tell stories and not as a new way to mediate between scientific research and the public. Moreover, the data most journalists work with is descriptive, relatively small in size, and the analysis performed on it is described as rather simple (finding minima, maxima, outliers and so forth). A deep insight into statistics or the inner workings of algorithms would not be necessary for the fast majority of data journalism performed today. If special knowledge is required it is common practice to ask external experts.
Nevertheless, it became clear that scientists and journalists could cooperate for mutual benefit – journalists could profit from the insights of researchers by using their tools, while scientists could test their theories and algorithms with the data and real use cases provided by journalists.
2. Transparency and reproducibility in data journalism
An important theme for both scientists and journalists was the transparency and reproducibility of how data journalists gain their insights and produce their stories. Considering that researchers have been struggling to ensure the transparency and reproducibility of their own work for centuries without finding a perfect solution, journalists and scientists should cooperate on this issue to find new ways suitable for today’s technological affordances. While all agreed that using open source tools and sharing the code behind the story is a good idea (at least in theory, see the next point), how to make sure that others can really recreate and evaluate the development process? One possible solution discussed at the conference were Jupyter notebooks. Jupyter notebooks combine the writing of text with the writing and execution of software code. This means that you can start a notebook with a long text-only introduction followed by a code snippet together with it’s output, followed again by some discussion of the code and so forth – you can check this example to see how it looks like. Jupyter notebooks are intended to be used in an iterative and explorative way and make every step taken by the writer/developer transparent. I think it was quite remarkable that Fernando Perez, the creator of IPython and now leading developer of the Jupyter project gave a keynote about his work at the conference. He showed an interesting example of how this technology has been used in journalism: Brian Keegan’s extensive critique on FiveThirtyEight, which used a Jupyter notebook to recreate and question the work of journalists. There were a few journalists at the conference who already use Jupyter notebooks for their own work. For them, the value of these notebooks was not only transparency, but increased productivity: being able to easily recreate the steps taken on a dataset a few weeks ago saves a lot of time and makes collaborations easier.
Will Jupyter notebooks be a big thing in data journalism? I have some doubts about that…
3. Data journalism is full of experimentation and uncertainties
It became clear that data journalism is far from being an established from of news reporting in Germany, both in terms of organizational structures and in the way it is practiced. This already starts with defining what data journalism actually is – it is telling that several keynotes and workshops started with providing a definition. Organizationally, the way data journalism is integrated in newsrooms varies quite a bit and newspapers are still experimenting a lot. In some cases, there is a distinct team that only does time consuming investigative stories while in other cases data journalism is more integrated into daily news reporting. In many cases, however, it seems that there is no clear structure and data journalism projects are organized ad hoc.
When it comes to how data journalism is practiced by individual data journalists, I tweeted this:
(???)) October 24, 2015
To be clear, it’s not that I think data journalists don’t know what they are doing. What I mean is the way they talk about how they work and what level of expertise they have. There seems to be a lot of insecurity and experimenting involved in ‘doing’ data journalism. One expression of that is the reliance on finding information on Google. Julius Tröger, who works at the Berliner Morgenpost and gave a workshop on mapping data, gave the ultimate advice to deal with ‘geeky stuff’: search for the name of the tool you want to use on Google and add ‘for journalists’. Some journalists told me that they constantly search on Google or Stack Overflow while they’re coding to slowly tackle the issues they are concerned with. It also seems common to create a set of re-usable scrips over time. Probably as a result of that, journalists tended to describe their code as ‘lousy’ or ‘dirty’. It seems that the average data journalist is far from being able to examine and critique the algorithms behind big data, something scientists were hoping for. Moreover, this insecurity in their own coding skills made some journalists hesitant to the idea to share their code online, which implies a bit of a conflict between theory and practice: while all journalists agreed on the importance of transparency and reproducibility, I’m not sure how many of them would be willing to share their research process as a Jupyter notebook (or some other form).
To sum up, what I take from this conference is:
- For journalists, data journalism is not a revolutionary new form of journalism, but a set of techniques to extend the traditional journalistic work of finding and telling stories.
- The lack of transparency and reproducibility is recognized as an issue, it will be interesting to see what kind of solutions will become more widespread in the future.
- Data journalism in Germany is still in its infancy and is marked by experimentation and insecurity.
As I mentioned above, these are only a few observations. If you understand German, make sure to check the conference documentation provided by the organizers.
Mayer-Schönberger, Viktor, and Kenneth Cukier. 2013. Big Data: A Revolution That Will Transform How We Live, Work and Think. London: Murray.