I just returned from OSDI/ATC 2022, the collocated systems conferences, and still find myself disturbed by something one of the keynote speakers had to say. During questions after his talk about improving the security of Open Source, Eric Brewer was asked by a professor if he thought that graduate students should curate their software. Brewer answered that they should focus on finishing their graduate studies instead. I disagree.
Strangely enough, what got me started on this thread was code.org. A non-profit organization that wants to bring “CS into the classroom”, does seem like a good idea. I have friends who I introduced to computing through the game of Adventure that went on to have good careers in IT.
Another site is much more blunt. To answer the question about why teach kids to code, they said:
What about open source? Aren’t there millions of folks working together to create the software that we all use? I took a look at a recent study by the Linux Foundation and Harvard into who produces the most popularly used open source software. The Open Source Software – Application Libraries report is 162 pages long, and I did not read the entire report. I focused on some summaries, one of which I want to share with you now. This is from page 18, point 3, in the Lessons Learned section:
Reviewing 49 of the top 50 non-npm projects from our lists, for commits in the year 2021, it was found that 23% of projects had one developer accounting for more than 80% of the lines of code (LOC) added. Further, 94% of projects had fewer than ten developers accounting for more than 90% of the LOC added. These findings are counter to the typically held belief that thousands or millions of developers are responsible for developing and maintaining FOSS projects. At a higher level, it was found that 136 developers were responsible for more than 80% of the LOC added to these 50 FOSS projects.
So much for Eric Raymond’s “millions of eyes” that would uncover and repair bugs in code. The reality is much starker: a relatively small number of developers are responsible for writing and maintaining the code that millions of developers are using. Note that the one project not included in this excerpt is missing because it is an outlier, but not in a good way. That project hides the identities of committers.
Eric Brewer’s keynote was about the trustworthiness of the supply chain for open source software. I had, mistakenly, thought that most internally developed software used by large corporations would have been developed in house. Instead, the norm has become using open source as the basis for internal projects, while not contributing to those projects. Someone at the conference, perhaps Brewer, showed a letter sent by lawyers to the maintainer of an open source project demanding that he fix a problem they had found “at once” or face legal consequences. Imagine, taking someone’s software, using it for free, then demanding support. Immediately, yet, when very large software companies often take over two months to produce a patch. Perhaps that company, and others like them, should consider paying open source developers, at least those who are key committers, instead of paying lawyers for demand letters.
Where I parted ways with Brewer had to do with his suggestion that graduate students not be expected to curate the code they produce. Granted, many research projects resulting in published papers rely on code that will never be reused. But certainly not all. At OSDI’21, 84% of the accepted papers included artifacts. At OSDI’22 and ATC’22, more people were involved in examining artifacts, including both code and data, than were on the program committees. This seems to suggest that there has been an increase of attention on code quality, at least to the point of being able to reproduce the experimental results reported in a paper submission.