Does Team Size Impact Code Quality?
There are a few common questions that we get when speaking with Code Climate customers. They range from “What’s the Code Climate score for the Code Climate repo?” to “What’s the average GPA of a Rails application?” When I think back on the dozens of conversations I’ve had about Code Climate, one thing is clear - people are interested in our data!
Toward that end, this is the first post in a series intended to explore Code Climate’s data set in ways that can deepen our understanding of software engineering, and hopefully quench some people’s curiosities as well. We have lots of questions to ask of our data set, and we know that our users have lots of their own, but we wanted to start with one that’s fundamental to Code Climate’s mission, and that we’re asked a lot:
Does Team Size Impact Code Quality?
Is it true that more developers means more complexity? Is there an ideal number of developers for a team? What can we learn from this? Let’s find out!
The Methodology
There are a variety of different ways that this question can be interpreted, even through the lens of our data set. Since this is the first post in a series, and we’re easing into this whole “data science” thing, I decided to start with something pretty straightforward - plotting author count against Code Climate’s calculated GPA for a repo.
Author count is calculated by summing the authors present in commits over a given period, in this case 30 days. The GPA is calculated by looking up the most recent quality report for a repo and using the GPA found there. Querying this data amounted to a few simple lines of Ruby code to extract from our database, including exporting the data as a CSV for processing elsewhere.
I decided to limit the query to our private repositories because the business impact of the answer to the question at hand seems potentially significant. By analyzing private repositories we will be able to see many more Rails applications than are available in Open Source, and additionally, the questions around Open Source repos are manifold and deserve to be treated separately.
Please note that the data I’m presenting is completely anonymized and amounts to aggregate data from which no private information can be extrapolated.
Once I had the information in an easily digestible CSV format, I began to plot the data in a program called Wizard, which makes statistical exploration dead simple. I was able to confirm that there was a negative correlation between the size of a team and the Code Climate GPA, but the graph it produces isn’t customizable enough to display here. From there I went to R, where a few R wizard friends of mine helped me tune a few lines of code that transform the CSV into the graph below:
The fit line above indicates that there is indeed an indirect relationship between code quality and team size. While not a very steep curve, the line shows that a decline of over 1 GPA point is possible when team sizes grow beyond a certain point; let’s dig a little deeper into this data.
The graph is based on data from 10,000 observations, where each is a distinct repo and the GPA measurement and author count measurements are the latest available. The statistical distributions of the two columns show that the data needed some filtering. For example, the GPA summary seems quite reasonable:
> summary(observations$GPA)
Min. 1st Qu. Median Mean 3rd Qu. Max.
0.000 2.320 3.170 2.898 3.800 4.000
As you would expect, a pretty healthy distribution of GPAs. The author count, on the other hand, is pretty out of whack:
> summary(observations$AuthorCount)
Min. 1st Qu. Median Mean 3rd Qu. Max.
1.000 1.000 2.000 3.856 4.000 130.000
The 3rd Quantile ranges 4-130! That’s way too big of a spread, so I’ve corrected for that a bit by filtering the max to 50 in the graph above. This gives enough of a trend to show how new teams may tend to fall according to the model, but doesn’t completely blow it out with outliers. After working on this graph with the full data set and showing it to a couple people, some themes emerged:
- The data should be binned, with 10+ authors being the largest team size
- Lines or bars could be used to show the “density” of GPAs for a specific binned team size
Some CSV and SQL magic made it pretty straightforward to bin the data, which can then be plotted in R using a geometric density function applied to team sizes per GPA score. This lets us see areas of concentration of GPA scores per team size bin.
Now that the team sizes are broken up into smaller segments, we can see some patterns emerge. It appears easier to achieve a 4.0 GPA if you are a solo developer and additionally, the density of teams greater than 10 concentrates under the 3.0 GPA mark. This is a much richer way to look at the data than the scatterplot above and I learned a lot comparing the two approaches.
Nice first impressions, but now that we’ve seen the graphs, what can we learn?
Conclusions
How to interpret the results of this query is quite tricky. On one hand, there does appear to be a correlation supporting the notion that smaller teams can produce more organized code. On the other hand, the strength of this correlation is not extreme, and it is worth noting that teams of up to 10, 20, and 30 members are capable of maintaining very respectable GPAs which are greater than 3.0 (recall from the summary above that the mean GPA is 2.898 and that only the top 25% score greater than 3.8).
The scatterplot graph shows a weak fit line which suggests a negative correlation, while the density line graph (which could also be rendered with bars) shows that teams of up to 10 members have an easier time maintaining a 3.0+ GPA than teams with more than 10 members. Very intriguing.
It is worth considering what other factors could be brought into play when it comes to this model. Some obvious contenders include the size of the code base, the age of the code base, and the number of commits in a given period. It’s also worth looking into how this data would change if all of the historical data was pulled up, to sum the total number of authors over the life of a project.
One conclusion to draw might be that recent arguments for breaking down large teams into smaller, service oriented teams may have some statistical backing. If we wanted to be as scientific about it as possible, we can look at how productive teams tend to be in certain size ranges and produce an ideal team size as a function of productivity and 75% percentile quality scores. Teams can be optimized to produce a certain amount of code of a certain quality for a certain price - but now we’re sounding a little creepy.
My take is that this data supports what most developers already know - that large teams with large projects are hard to keep clean; that’s why we work as hard as we do. If you’re nearing that 10-person limit, know that you’ll have to work extra hard to beat the odds and have a higher than 3.0 GPA.
On a parting note, we are sure that there are more sophisticated statistical analyses that could be brought to bear on this data, which is why we have decided to publish the data set and the code for this blog post here. Check it out and let us know if you do anything cool with it!
Special thanks to Allen Goodman (@goodmanio), JD Maturen (@jdmaturen), and Leif Walsh (@leifw) for their code and help and to Julia Evans (@b0rk) for the inspiration.