Coursera engineering believes developer productivity is second only to site availability, and therefore we consistently invest in tools to accelerate the pace of innovation. The faster our engineers can launch and iterate, the better we can make the Coursera learning experience. High quality self-service deployment tools allow us to ship features quickly. Although we’ve worked hard on our server-side deployment technologies, we haven’t shown our front-end developers enough love. Until now.
With Rapidash—our new front-end deployment system—we give our developers the power to launch their changes to our world-wide learner community in under 3 minutes. Rollback is 10 seconds away. Even better, new frontend code can be safely and reliably tested in production without impacting our user experience whatsoever.
Based on our previous work on backend deployment tooling, we have found that when developers deploy their own code one commit at a time, we spend less time debugging and more time developing. Because most problems can be quickly attributed to a single commit, there is less code to search through when breakages occur. Further, developers know what parts of the site have been affected by their own code changes and thus know which metrics to monitor.
Learning from our backend deployment systems, if the process takes less than 10 minutes, most developers will deploy their own changes. Unfortunately, before Rapidash, we deployed client-side code by baking all static assets (CSS, JS, etc.) into an AMI and deploying the frontend code with our legacy backend code–a process that took around 30 minutes every time. Developers found this to be a relatively heavy-weight process that interrupted their work for an unacceptable amount of time. As a result, changes from many developers would frequently languish on the master branch waiting to be deployed.
The Rapidash System
Rapidash is Coursera’s rapid deployment system for our web frontend. After code is tested and reviewed, a developer lands their changes on to master. Immediately, Jenkins kicks off a build run to compile, minify, and persist the new static asset bundles (css, js, etc.) in Amazon S3. Upon successful upload, the build process sends the developer a private message on Slack with courtesy links to the Rapidash console. Developers navigate Rapidash just as they would navigate through their local Git history: every commit, along with its description, author, and timestamp, is displayed in reverse-chronological order on the side. To deploy a new version of Coursera, developers simply click the version they want, and slide the traffic slider from “old deploy” to “new deploy.” Within 10 seconds, edge—our custom Scala-based service that serves our HTML pages—will switch to serving new HTML that references the new JS and CSS bundles. If an engineer needs to roll back, they simply select a version that’s lower in the list than the version currently serving traffic, and slide the traffic slider back. Including JS-build time, production is less than 3 minutes away from a merge into master.
Testing In Production
Results, with numbers
Anecdotally, developers are much happier with the system. While we do not have direct quantitative measures of developer productivity, there are other metrics we look at as potential proxies. Immediately after deploying Rapidash, the number of frontend code pushes skyrocketed, with the majority for a single commit. Below is a graph showing the number of deploys over time for our legacy backend, new backends, and Rapidash. Interestingly enough, the number of backend code pushes also increased commensurately. Although the frequency of code deploys have many confounding variables, we believe that Rapidash helped accelerate Coursera’s engineering team in delivering major product features.