Notes on Working in Public - Nadia Eghbal
The book was released in 2020, and is a good starting point to analyze the global trend around open source projects. In the book, Nadia Eghbal tries to explore open source software through different lenses, she explores the economy of open source software using the knowledge from the commons (cites Elinor Ostrom more than once), and it extrapolates ideas arising from how we value public infrastructure.
The second part of the book is focused on less groundwork and more on practical aspects regarding the maintenance costs of open source projects. It gives plenty of examples regarding specific projects and how the developers manage the interaction with users. The book narrows down into the single aspect that is a bottleneck: the attention developers can pay to requests and contributions.
It all starts by giving a refreshing look at what open source has become. It rightly points out that open Source is not what it was, many projects do not have political statements behind as they used to have in the past. This is reflected by the choice of permissive licenses such as MIT or BSD instead of GPL. It is a very interesting point that can also be looked from the perspective of choosing technology based on their incentives. However, there are some exceptions, see: Open source is a political act.
The book briefly discusses whether Github is contradictory of open source values in the sense that it concentrates development and maintenance around a corporate-owned single-platform. But the author concludes the convenience and lower costs (Github is fundamentally free of charge) made developers go for the low effort approach even if they were reluctant at first.
One of the keys to success of Github, is related to the idea of a community. By having a tool with which other developers are familiar, the entry barrier is quite low. New contributions follow the same process in many different projects, effectively standardizing the workflow. On the other hand this destroys the idea of the commons, since projects won't have clear boundaries any more.
The book also develops a framework to categorize Different types of open source projects based on community engagement. The author places all projects in one out of four categories: Federations, Clubs, Stadiums, and Toys. The key difference between them is the ratio between users and contributors. Federations have plenty of contributors and users, clubs are projects in which there is parity (i.e. most users are also contributors). Stadiums are projects with large following but few contributors and toy projects are those develop by a single person for personal use.
Based on the type of project, the roles and relationship of the people involved are going to be different. Something I haven't considered before is the interaction between Stackoverflow and Github. StackOverflow allows issues to be resolved without ever reaching the core contributors: sometimes users themselves answer questions to each other.
The second part of the book focuses on exploring how the work of an open-source maintainer looks like, and it opens more questions than it answers. It builds directly on the groundwork of the first half, after establishing what open source is. One of the first topics is how to measure the value of code. What the book argues is that there is no single way to assign value to open-source software. Number of contributions/contributors is partial, number of other projects that depend on it does not show whether the code can be easily replaced by another or not, etc.
Besides value, there are costs associated with open-source. Even though the marginal costs are virtually zero for software that can be distributed online, there are also Maintenance costs of open source projects that relate to the time developers need to dedicate to the community. It pays attention to some contributions that can be extractive contributions to open source projects, meaning that the value they deliver to the project is lower than the value a maintainer could have provided doing something else.
The entire analysis is built on the assumption that the attention from developers is a limited resource which is rivalrous and non-excludable2. For developers, therefore, lowering the maintenance burden of open source projects becomes a goal in itself, and to this end they start using automation methods such as bots, continuous integration, tests, templates, etc.
The final few chapters of the book are centered around the governance of open source projects. Developers can extract much less value from maintaining a project than from creating it. Therefore, there is not a lot of incentive to keep working on code that requires too much time. The notion of how much time and effort it requires to maintain a program, however, relates to how it is managed. Not all projects are collective efforts. The book advances an interesting idea called a one way mirror in which the discussion is public, but the ability to participate on it is limited.
Taking into account the ideas laid out at the beginning of the second part, the book explores the role of money in open source. First, it explains why companies would pay for open-source, citing examples of increasing their public image, guaranteeing maintenance of critical infrastructure, or improving code quality. Developers who are not directly employed by those companies can leverage the need for attention in order to secure funding.
The final part of the book explores the possibility of individuals funding developers. This is very similar to what is defined as "the passion economy". People create tools because they enjoy it, because they gain reputation. There are many similarities between modern developers and influencers. The conclusions of the book really point to some form of equivalency between instagrammers and coders.
What I've learned and what I'd discuss
There are no doubts that the book is well researched. The source material include, among other things, podcasts and Youtube videos that I personally would have never listened to. The excerpts are very valuable since they paint a panorama of a very broad topic. There are quotes from Bill Gates in the 70's all the way to Taylor Swift.
Relating Open Source to The Commons or to Infrastructure is particularly interesting. The discussion of the commons is happening in other circles as well, especially regarding scientific results and climate change, and joining open-source projects to the pool is a great addition. However, I do think there are gaps in the arguments that could have been explored.
For example, the book never uses the perspective of a consumer of code, who will also be a consumer of creator attention. Often, a creator also requires attention on a different project. The people behind a Python library may require attention from core python developers, creating a dual role (and perhaps dual conflict) that may help untangle the burden of maintaining projects.
Having worked at Github, I imagine the author had access to a wealth of data. It is a pity that it was not more thoroughly used to analyze the current scenario. For example, the book has a table with the average number of direct contributors and dependencies of the top 50 packages on different platforms, sadly we don't get the median, which can be less susceptible to extremes (i.e. 1 single package with 10000 contributors and 49 with 1).
However, I wonder if this is true for other languages, tools, and applications. I don't have the same feeling in the Python community (although is changing). Not everyone who is building open source code is creating websites. But those making websites are the ones with the largest exposure (React, VueJS, Babel). I have the feeling that in the book there was little fine-grained analysis and projects were averaged. Abstract averaging methods are prone to weight biases, and perhaps specific communities were over-represented.
Things I missed in the book
At some point the book discusses the Linus law, but it missed a very important example: the Heartbleed bug that affected OpenSSL. Software is part of our global infrastructure, and open-source has a crucial role in it. What the Heartbleed bug meant is hard to wrap around in few sentences. I honestly expected a thorough discussion in the book. OpenShift, Apache, Nginx, Qt, different distributions of Linux, they are all examples of open-source programs at the base of many of the things we give for granted. These tools are used by companies raging from Amazon to Facebook, to Volvo. SpaceX uses Electron to build the user interfaces with which the astronauts interact.
The book also lacks exploring the responsibilities that developers have when releasing code. When the book compares software to bridges, it misses the opportunity to discuss that if a bridge collapses, the engineer that signed the blueprints will be scrutinized. Open source software does not follow the same rules, and therefore what is delivered can be of dubious quality. Licenses normally include sentences making it explicit: The software is provided as is. Can the world afford to continue building on top of unregulated responsibilities? Devs may not have an appropriate background and still release security-related tools. How do we vouch for them beyond popularity?
I regret that the book mentions projects such as matplotlib only in the very last pages although I was surprised of the selection of Astropy as an early example of a club type of project. Matplotlib, ImageJ, numpy, are projects that, at least in part, are financed by public money of different countries. The Chang-Zuckerberg-Initiative has been funding open-source projects for a while, NIH as well, and other agencies around the world are sustaining research-software development is some critical areas. However, any hint at what open-source means for the scientific community (and its funding schemes) is missing.
I missed any kind of reflection on companies that work directly on open source software. For example, Red Hat and Open Suse are two examples of business competitors that contribute human hours to some overlapping projects. Anaconda is missing from the book although it plays a central role in the numero-scientific community (academic and corporate). Android is based on Linux and open source but it operates in ways very different from what we expect from Ubuntu, for example. Their business models, contributions to society, and role in open source as a whole is, sadly, not discussed.
The final thing I wanted to reflect on is the poor gender balance of the examples chosen. I haven't tracked down every quote, but I ended up with the feeling that the book focuses almost exclusively on male-led projects. The programming world has never been a gender-diverse environment, but I had hopes that his book would find ways to push the scale towards a more inclusive environment.
These are the other notes that link to this one.