Does GitHub co-pilot improve code quality?
I was surfing through YouTube to learn something new in tech and came across this video on how GitHub lied about the Copilot stats. After watching some videos and reading blogs about the topic, I became a little sceptical of GitHub Copilot on how a technology that was ready to replace software engineers is now lying about its stats, and it doesn't even look like it makes significant contributions to replace anyone.
This blog is written by Akshat Virmani at KushoAI. We're building the fastest way to test your APIs. It's completely free and you can sign up here.
Here is what I found.
GitHub Copilot Study Claims
For people who might not be familiar with the GitHub Copilot game, it is a coding assistance by GitHub, owned by Microsoft, that has introduced the world with ChatGPT, which will help you write code and give valuable insights and feedback to make your code better or give you good coding knowledge or learnings.
In the GitHub Copilot Study, GitHub experimented with 243 developers and split them into two groups, one using Copilot and one not. Here is what they allegedly found:1. Developers using Copilot are 56% more likely to pass all 10 unit tests in the study.
2. Were able to write 13.6% more lines of code on average without readability problems or errors.
3. Produced code that was 3.62% more readable, 2.94% more reliable, 2.47% more maintainable, and 4.16% more concise, and I had a 5% higher chance of getting their PR approved.
But these results were in percentage, which doesn't give the full picture as these stats could be experimented on with a bad given code, to begin with, which doesn't show a lot of improvement. And this 3-4% doesn't sound a lot; in my perspective, it sounds like me saying: “The pizza I ate today was 3% tastier than yesterday.”.
To make it more sceptical, the graph that GitHub provided for the Case Study doesn't make a lot of sense.
Breaking down the Graph
Here are the graphs provided by GitHub in the survey:
The problem with this diagram is that the numbers don't add up to a hundred, which is very misleading. However, the numbers do add up to 100 when broken into groups of those who passed and failed the tests. This is done to confuse the readers and aligns with the saying "lies, damned lies, and statistics", and the task given to the developers was fairly simple or common to API with only 10 unit tests. APIs can take hundreds of test cases, which, by the way, you can easily do by using KushoAI, and in that regard, 10 unit test cases are not much. Keep in mind that only 243 developers were involved in the survey of a product that is used by hundreds and thousands of developers worldwide.
Now, let's see how GitHub Coplit’s code quality is due to lines of code written in the above diagram.
Initially, it was stated that Copilot could add up to 13% more code before introducing code smells; it is a sign in source code that indicates a potential issue. However, this translates to only two lines of code, which can be statistically relevant but practically negligible.
The study takes a questionable turn in defining quality and error assessment, focusing on "Code Errors" as issues reducing code readability rather than functional errors that can break the code, which were excluded as irrelevant to software quality. Instead, the analysis covered inconsistent names, excessive line lengths, etc. These criteria lack clear definitions and depend heavily on context, technology, language familiarity, and developer experience.
It also looks like the developers who passed all the unit test cases were assigned to grade or review other’s code style quality; this also creates some problems as every developer has their own bias and way of reviewing. Instead, they could have reviewed it by third-party developers.
Final Words
It's sad to see that we got baited and excited over these small and false claims without doing proper research on this so-called survey or study without proper context, which was done between a random small group of developers by a company that can be considered a superpower in the developer ecosystem.
But I am also relieved that AI tools like GitHub Copilot won't be replacing my developer job any time soon.
This blog is written by Akshat Virmani at KushoAI. We're building an AI agent that tests your APIs for you. Bring in API information and watch KushoAI turn it into fully functional and exhaustive test suites in minutes.
Member discussion