Civic Hacking and Journalism How data is changing civic engagement

Scraping the global civic tech community on GitHub, part 2

After I shared the post about my little GitHub experiment in mySociety’s community mailing list, it became much more popular than I anticipated. In the meantime, I got some help to compile a more complete list of civic tech organizations on GitHub and spent all day scraping. Will the more complete list yield different results? Let’s see!

I want to emphasize again that GitHub is an inaccurate proxy to describe the global civic tech community. Individuals or groups who are not using GitHub’s social features (such as following or starring) are underrepresented in this data. Moreover, when we talk about civic tech on a global scale we are not only talking about developers. Naturally, activist groups are not using GitHub as much so they are underrepresented as well. Nevertheless, there are some interesting tendencies that this data reveals. If you’re interested, you can find my scraper here and the data used for this post here.

I also highly recommend to check Liliana Bounegru’s work on using GitHub to study data journalism and data activism with Digital Methods – this article heavily borrows from it!

The follower network

First, let’s recreate the follower network with the new dataset. Each node is a user; it’s color indicates the organization he or she belongs to, while the size of the nodes reflects the number of followers (the more the bigger):

Download HiRes version
Download HiRes version

maxogden still has a very central position and is best connected across different continents. In my last post, I suggested that this is due to his extensive use of GitHub’s social features and because he develops one of the most popular tools among civic hackers, dat. Meanwhile maxogden commented on it in mySociety’s community mailing list:

I’m guessing the reason I appear as a ‘hub’ is that I’ve had the privilege of being able to travel to the UK and Taiwan multiple times and meet the civic hacking communities there, as well as getting to contribute to open source civic hacking projects for the last 5 years more or less full time starting at Code for America in 2010. Hopefully over the next 5 years a lot more people will get the opportunity to get paid to work on civic software and travel to meet other communities!

I think his comment is very interesting because it points out how economic factors influence the structure of the network. It also underlines that the dominant European civic tech groups are stationed in the UK.

Similar to the previous version we can see how the community groups up by region:

Why are US groups and g0v relatively separate from the rest? One possible reason is size: g0v and Code for America are probably the biggest civic tech organizations, at least when it comes down to the number of developers. This means they have more connections among each other and thus get clustered in the graph. Whether there is really something like a ‘filter bubble’ going on I cannot tell.

The contributor network

As Zarino rightfully pointed out, we might get a more realistic picture of actual community ties when we look at who contributes to repos. I suggest that we can best understand the follower network above as a proxy for exchange (of ideas etc.), while the contributor network is more a proxy for collaboration. This time the color of the edges illustrates to which organization a repo belongs, while the size of the nodes reflects how many repositories a user has contributed to (the more the bigger):

Download HiRes version
Download HiRes version

This network is a lot busier. I should point out that I had to filter out nodes with a low degree range to allow the remaining nodes to group up nicely. This means, to be precise, that every user who has contributed to less than five repos is filtered out.

  1. First, there are some important similarities to the follower network: US groups are relatively close and separate from the rest, while groups from Europe, Latin America, Australia and elsewhere are close to each other at the bottom. However, we can see that Sunlight Labs has closer ties to Europe than Code for America.
  2. When we look at collaboration, g0v seems to have much more ties to European groups than to the US.
  3. It’s interesting how the African groups split up here. In the follower network above they were close to each other and to the European groups. Here, Ushahidi is far outside on the right, while Code for Africa and Code for South Africa are much closer to the European and Latin American groups. It seems there is a lot of exchange, but not much collaboration between Ushahidi and groups from Europe or elsewhere.
  4. An Asian group which was almost invisible in the follower network is much more prominent here: Neo from Singapore. Despite being geographically close to g0v (relatively speaking) they do not seem to collaborate very much. Moreover, Neo seems to be closer to US groups, while g0v is closer to Europe.
  5. maxogden lost his central position. When we look at where he works or where he contributes to, he is much closer to the US groups than to other groups around the world (you can find his node above the ‘Sunlight Labs’ label).
  6. In case anybody wonders: Rufus Pollock aka rgrp from Open Knowledge contributed to the most repos.
  7. In the middle, the different clusters seem to be connected by a relatively thin web of tools which are popular among civic hackers but do not necessarily belong to any civic tech organization (like Discourse).

Let’s check GitHub’s starring feature again:

  1. As before, the left side lists the repositories owned by the civic tech organizations sorted by the number of stars. This means it shows which civic tech repositories are most starred among GitHub users in total, including those who are not part of any civic tech organization. Surprisingly, we have a new winner: Neo’s Ruby Koans tops Recline. Beside that, not much has changed: the most popular civic tech repositories are a mix of data tools, tutorials, and ‘proof of concept’ exemplars of civic tech applications.
  2. The right side again shows the repositories which have been starred by the members of the civic tech organizations in our dataset, regardless of whether the starred repositories are own by civic tech organizations or not. No big changes here either: The most popular repos among civic hackers are tools to help developing websites and working with data. As mentioned before, the popularity of impress.js and reveal.js indicates that presenting at conferences or workshops about ideas and experiences is very common, which suggests that civic tech is still a relatively new field with a lot of experimentation. Still wondering about the popularity of Discourse.

Locations and differences in civic tech around the world

Last but not least, I recreated the map showing the location of different civic hackers around the world. Again, GitHub allows users to specify their location in whatever way they want, if they specify their location at all. Often, only the home country or the continent is mentioned, which means that this map is inaccurate but shows some general tendencies:

Again, not much changed. We have a few more civic hackers in Mexico, the Middle East and Asia, but my previous analysis still holds, so I just repeat it here: Despite being increasingly global, this maps shows how much civic tech is a Western phenomenon. This is reflected in the interviews I had with members of mySociety, where it was pointed out the UK websites are a “magnitude busier and perhaps more successful” than in other places (especially in Africa) because they had ten years to grow.

Given that GitHub is platform for developers, this map also seems to underline some of the other comments from my interviews about the cultural differences in civic tech around the world. The dominance of developers in Europe and the US might be due to the fact that civic tech has stronger roots in the technology scene in these areas. By contrast, civic tech in Latin America is driven more strongly by activist groups who have discovered how useful civic tech applications can be to support their cause.

Feedback welcome!

As mentioned above, this was a little experiment to get a grasp about the civic tech community on a global scale. I would love to hear from people involved in this community how they read these information and whether my interpretations somewhat match reality.

(Outdated) Mapping the civic tech community on GitHub

Check updated version here

How can we describe the global civic tech community? To date, it’s pretty hard to find answers to this question given that there is not even a consensus on how to define civic tech. However, there are some interesting proxies to explore this community. One of them is GitHub as most civic tech projects and developers are using it. Another one is the Poplus community, which is a deliberate attempt to create a ‘global federation for civic tech’.

I took this list of Poplus members and added a few organizations which were mentioned in the interviews I had with members of mySociety. Then I searched each organization on GitHub and ended up with this list of accounts:

mysociety
poplus
everypolitician
sinar
opennorth
okfn
codeforamerica
Code-for-All
okfde
openaustralia
ushahidi
sunlightlabs
datauy
congresointeractivo
ciudadanointeligente
govtrack
MuckRock
g0v
civio
openkratio
KohoVolit
regardscitoyens
teampopong
openpolis
TEDICpy
e-democracy
azavea

I then wrote a GitHub scraper to gather some information about the activities of the members of these organizations. I should note directly that GitHub is just an inaccurate proxy to describe this community. To illustrate this with a very specific example, I talked with Mark Longair who is a senior developer at mySociety. He has worked on many projects over the years and is an active member of the Poplus community – but this is poorly reflected in my data because he is not making much use of GitHub’s social features such as following other users or starring repositories. Therefore, these findings should be met with skepticism. Nevertheless, I think a few interesting tendencies surfaced.

The follower network

I generated a follower network to see how these organizations are connected with each other and which individuals are best connected within the larger civic tech community, i.e. who has the most connections across different organizations. This is the result (with the size of the nodes reflecting how many followers a user has):

  1. The most striking result is the key position of maxogden. One reason: He develops some of the most popular tools among civic hackers, especially dat (see below). Another, more simple explanation is that he makes extensive use of GitHub’s social features.
  2. It’s interesting to see how the different organizations group up by regions. In the upper right we have Asian groups, especially g0v (green). At the bottom is the US with Code for America (red) being the dominant actor. Most interestingly, at the upper left we have a mix of mostly European and Latin American groups, but also some groups from Canada or Australia. This might be unsurprising considering that the Poplus federation was founded by mySociety from the UK and Ciudadano Inteligente from Chile. Still, it’s curious that European and Latin American groups seem so well connected, while North American and Asian groups are relatively separate (with the exception of maxogden, who is well connected to every continent).
  3. At the far left is the African NGO Ushahidi, which only has a few connections to European groups. I would have expected them to be better connected. Maybe this is due to GitHub being an inaccurate proxy to illustrate these larger structures.

To get a sense of which repositories are most popular among civic hackers, I looked at GitHub’s ‘starring’ feature:

  1. The left side lists the repositories owned by the civic tech organizations sorted by the number of stars. This means it shows which civic tech repositories are most starred among GitHub users in total, including those who are not part of any civic tech organization. Most popular by far is Recline, a library for ‘building data applications in pure Javascript and HTML’. Ushahidi appears twice in the top 20, which indicates again that the follower network above is a bit off. In general, it’s interesting to see how the most popular civic tech repositories are a mix of data tools, tutorials, and ‘proof of concept’ exemplars of civic tech applications.
  2. The right side shows the repositories which have been starred by the members of the civic tech organizations listed above, regardless of whether the starred repositories are own by civic tech organizations or not. A bit surprisingly, there are no civic tech repositories in the top 20. Beside that, the results are pretty much what one would expect: Tools to help developing websites and working with data. The popularity of impress.js and reveal.js indicates that presenting at conferences or workshops about ideas and experiences is very common. I suggest that is also an expression of civic tech being a relatively new field with a lot of experimentation. What I could not figure out is the popularity of Discourse, an open source discussion platform.

Locations and differences in civic tech around the world

GitHub allows users to specify their location. However, users are free to do that in whatever way they want, if they specify their location at all. Often, users just mention their home country or in some cases the continent they live on. It goes without saying that the resulting map is inaccurate, but good enough to show the general direction:

Despite being increasingly global, this maps shows how much civic tech is a Western phenomenon. This is reflected in the interviews I had with members of mySociety, where it was pointed out the UK websites are a “magnitude busier and perhaps more successful” than in other places (especially in Africa) because they had ten years to grow.

Given that GitHub is platform for developers, this map also seems to underline some of the other comments from my interviews about the cultural differences in civic tech around the world. The dominance of developers in Europe and the US might be due to the fact that civic tech has stronger roots in the technology scene in these areas. By contrast, civic tech in Latin America is driven more strongly by activist groups who have discovered how useful civic tech applications can be to support their cause.