The sudden acceleration in digital transformation caused by the COVID-19 pandemic revealed how unprepared most businesses were. One of the biggest problems they still face is the “app gap,” the lack of applications that end users need to do their jobs effectively. Low-code and no-code tools go some of the way to filling the gap, with UI builders and robotic process automation, but there’s still a lot to do.
One option is to use machine learning to improve developer productivity. We’re already using basic rules-based tools to provide code completion and help expose methods, so why not go further and build on a massive data set of public code to share how common design patterns are applied, what algorithms are used in what contexts, and how developers take advantage of public APIs?
GitHub Copilot: AI coding assistant
That’s what GitHub has done, working with OpenAI’s Codex machine learning model (a code-focused language model like the familiar GPT-3) to build and train a service that works with your code editor to suggest next steps as you work. Calling it Copilot, GitHub describes it as an “AI pair programmer.” That’s an interesting way of looking at it, suggesting that Copilot is a collaborative tool rather than a prescriptive one.
Copilot has been trained on the millions of lines of code in public repositories. Installed as a Visual Studio Code extension, Copilot works within the context of your current editor window, providing suggestions based on what you type and feeding back details on what you use. Your private code isn’t used to train the service with new code samples. The only signals are the code you’re using.
You shouldn’t expect the code Copilot produces to be correct. For one thing, it’s still early days for this type of application, with little training beyond the initial data set. As more and more people use Copilot, and it draws on how they use its suggestions for reinforcement learning, its suggestions should improve. However, you’re still going to need to make decisions about the snippets you use and how you use them. You also need to be careful with the code that Copilot generates for security reasons. It’s impossible for GitHub to audit all of the code it’s using to train Copilot. Even with tools like Dependabot and the CodeQL security scanner, there’s a lot of poor-quality code out there exhibiting bad patterns and common bugs.
Despite the risks, there are some interesting ideas in Copilot: how it takes your comments and turns them into code, or how it suggests the tests that can be used as part of a continuous integration/continuous deployment (CI/CD) process. Building AI into the dev and test parts of a CI/CD devops model makes a lot of sense, as it can help reduce the load on developers, letting them focus on code. But again, you still need to be sure that those tests are appropriate and that they provide the right level of code coverage. You’re not limited to one solution at a time, as you can page through results in your editor, seeing what works best for you before you accept it.
GitHub Copilot is currently in preview with a waitlist here.
DeepDev: New AI models for developers
Microsoft is working on its own set of machine learning models to support application developers. Its prototype DeepDev service isn’t yet publicly available, but some documentation is visible. From what’s been published, it looks as though DeepDev uses similar techniques to GitHub’s Copilot, though possibly with a broader set of models.
Like Copilot, DeepDev has been trained on a mix of open source code and more general documentation, with a focus on understanding and working with source code. Some of its models are more general purpose, requiring additional training based on your source code libraries, while others are designed to handle specific common tasks.
You need an appropriate API key to access DeepDev, which includes a playground where you can experiment with the tools before building them into your own code. DeepDev appears to be a way of extending your own tools with Microsoft’s machine learning models, allowing you to build those models into a CI/CD pipeline to generate tests as code is checked in.
From IntelliSense to IntelliCode
Coding assisted by artificial intelligence is an interesting development that should make for better development tools. Technologies such as Visual Studio’s IntelliSense and IntelliCode already work to make development more efficient using code completion and real-time compilation tools to debug code as you write it. IntelliCode has been using GitHub public repositories to build code completion models, using GitHub star ratings as an indicator of code quality.
Context is key for any machine learning coding tool. If you’re using a set of APIs, the tool needs to respond to how you’re using those APIs, not to how everyone else uses them. Similarly, the tool needs to provide appropriate overloads for a method based on the code you’ve written. Having a sufficiently large set of training data and a responsive model is essential. What’s needed is a tool that helps you deliver what you want to deliver more quickly, not a way of repeating the same errors in a thousand other projects.
Generating code for data transformations
Programming by example like this is another useful way of adding AI assistance to your development process. Microsoft Research’s PROSE (Program Synthesis using Examples) is already in use in Excel and in many Azure and Power Platform tools, as well as in SQL Server. Visual Studio uses it as part of IntelliCode’s refactoring tools, looking for patterns in your code and suggesting where they can be reused. It’s also a useful way of extracting data and transforming it consistently, generating code that takes an input and delivers it in the expected output format.
AI-assisted development tools can best be thought of as a pair programmer built into your editor. It’s not machines generating code for you (though it can be if you want). Instead, treat it as advice that can speed up your development process, reducing bugs and automating repetitive tasks. Having your editor suggest tests helps you adopt test-driven development, and where it can generate regular expressions and transformations based on expected outputs, it simplifies string and data manipulations.
If we’re to get over that app gap, we need to deliver code faster and more consistently. Adding machine learning to the development process lets you pick the brains of thousands of other developers, without breaking your flow or theirs. Tools such as Stack Overflow help by providing examples of how other developers solved the same or similar problems.
These new AI-based tools take things a step further, parsing and understanding all of those millions of lines of undocumented code out there and finding helpful snippets as you need them, without having to search for them. All you need to do is sit down and code and look for suggestions as they come up.
Copyright © 2021 IDG Communications, Inc.