Continuous delivery isn't just about shipping features. It's about learning and behavior change. Learn how to apply what you learned in the discovery phase by deploying pilots for testing with users before you implement it across your entire platform.

Jon Kinney
Partner & CTO
Ryan Hatch
Head of Product Strategy & Innovation
Billy Sweetman
Head of Design
Andrew Verboncouer
Partner & CEO

As we get into the delivery loop, let's remember what Billy said. Delivery isn't just about shipping features. It's about learning and customer behavior change. What we're trying to do is measure whether or not a solution we've found and deployed will actually increase the number of customers viewing that first data visualization.

Did they have that aha moment and to learn properly? We believe it requires in product experimentation, which is so important, but so often not done. So let's dig into the delivery loop, and discuss how to ship and test features with a subset of your users in a sustainable way. At a high level, the delivery loop starts with a pilot, which is a limited release of new features to a certain number of customers.

How do you determine who gets in the pilot? Certain personas, like team accounts with more than 50 users who would benefit from seeing the new data? Visualization might be a good example or users who have applied for and are opted into receiving beta features, otherwise internal team members that also use your product in their day-to-day work can test out the edges of the system as well.

But a pilot, shouldn't just be the next set of features being used by the internal team or some random testers. To make a pilot useful, it should be at least three things: a well-defined experiment, released to a limited pool of users, collecting data and analytics for analysis and action.

So let's talk about experiment design. It's really important that we define upfront the success or failure criteria. Is it 10%? Is it 20% ? Use whatever threshold for success is appropriate for your scenario to ensure everyone's on the same page with what it'll take to keep the feature. You also need to be willing to pull out the feature if it doesn't succeed. We don't want unused features bloating the code.

We'll do this through feature flags, allowing in progress features to live side by side, with the full production release at the same time. AB testing these new features in that limited release, alongside those current public features. After the experiment is designed, we'll move into a limited release. To whom? Well, the folks that we already talked about getting into that pilot previously. Again, this is done at runtime alongside the stable code base.

So whenever possible ship, all the experiments to production by using feature flags, like I mentioned before. There are some limitations or instances where it is impossible, like software as a medical device or infrastructure code for planes or trains, et cetera. But even the electric car manufacturer, Tesla is doing beta testing of their full self-driving or FSD beta software through limited releases over the air to vehicles at customers' homes. Whether or not this is a good idea or believed to better software or safer roads remains to be seen.

But if Tesla can do it with some forethought, it's highly likely that your product can too. After our limited release, we need to know how the experiment performed. This is where the data and analytics come into play, and we can analyze our experiment results.

If the success criteria were met, great, it's time to scale up. If not, then remove your bad feature, pull it right out of production. This prevents you from taking on all the code maintenance, support burden, and it simplifies your code base. So that brings us back to our center metric. Of these two experiments that we ran, only one of them was successful. So we'll eliminate that failed experiment, but for the successful experiment, it's time to double down on it and scale up. Scaling up is an exciting thing, because not only are we hitting our metric, but we're actually helping people. As we scale up, we want to move into the full feature build.

This is where we'll add missing pieces of functionality. We want to flesh out these features based on actual feedback from our pilot customers. This leads to a release in promotion. So now it's time for marketing to go tell the world about this through an official announcement. Maybe you have folks that applied to be alerted when the latest features are released, it's time to hit their inboxes. And then we need to support at scale.

Some customers will struggle with new features along with communicating the changes through marketing promotions. Create guides FAQ's and training videos to help people be self-serve as they explore new features and functionality being there in the first 60 days to really ensure customers are happy with the result is critical just because the feature worked with 2% or 10% of your user-base doesn't mean you won't have issues at scale in order to stay on top of that use in-product help or chat platforms on the web and in-app tools to collect user feedback and engage with your customers to ensure they feel supported and heard.

But for all of this to be possible, there are some technical foundations for continuous delivery that need to be in place. We truly believe if we don't do all of this well, none of the other things we've talked about are possible. We'll talk about four main areas: version control, build pipeline, deployment, and production version control encompasses all things committing code.

Think about your Git repository. We need to establish a branching strategy, and I recommend having Maine be the integration branch and having a stable branch for every other environment. So you might have a production branch and a staging branch, a release 0.47 branch, for example, or a QA branch.

Maybe you need a sandbox branch because your product has an API that developers need to hit. We also want to make sure that after we have our branching strategy established, we're making atomic commits and feature focused branches that tell a story about why something changed, not just how.

If your commits are moving lots of things around in the code base in addition to adding the new feature, consider a technique called pre-factoring that will put the code base in an ideal state to accept that new feature. We also need effective pull requests and reviews. At a high level, these show the differences to the code that a developer is proposing to be added, to support a given feature or a bug fix.

If it's an involved feature or the code is confusing, we tended to favor pairing through a PR review rather than wasting a ton of time in asynchronous comments. You can actually set up your pull requests to run a lightweight version of the build pipeline , an optionally deployment, which we'll talk about here in a minute.

So that build pipeline - this is things like your automated test suites. In your automated test suites you need unit tests. These are designed to execute quickly and fail fast. If something fundamentally is flawed about the business logic of the system, we won't continue. Then we'll move on to integration tests, which check if the system, as a whole complies with the functional requirements, which would include things like hitting an external API, making a database call, and generally going through the system as an end-user would checking to see if all of the parts of the system are working together properly..

These are designed for fast feedback. We need to build up pipeline to be effective in our development phase so that we can have our tests set up to execute quickly. And then during deployment, if there's an error, we want the ability to roll back that build pipeline, roll back that deployment. It's important to automate the build because then we have a repeatable process which will help eliminate human error and optimize build times.

After our build pipeline has succeeded we're onto the deployment phase. We want to make sure we're doing automated deployment. This is often referred to as continuous delivery. It doesn't have to be in every environment, but it can help optimize testing when the latest is always available in a staging or a QA environment for example.

You could also do it at the pull request level per feature branch or bug fix branches as I mentioned before. So where does this get deployed? To ephemeral environments or containers. What are these?  Well there environments that are short-lived, they're easy to set up and tear down. They're scripted through code and pull requests.

So any changes to those environments are able to be reviewed by your team and they allow for more easily- horizontable scaling. There are several technical considerations when configuring your deployment pipeline. But one of the larger ones is zero downtime. We want to make sure we have properly configured database migrations to help support zero downtime deployment, because there are multiple connections to the database reading and writing at once.

The code needs to be able to be deployed in a way that allows for the new features to start working with the new database columns while the old code is still running against the same database for awhile. Then to clean that up, we create a second migration to remove the old columns from the database once no more rights are happening to the applicable database tables.

In reality, these might live for quite some time. If you have a feature flag of a new feature turned on for 2% of your users, your database might be in a state of flux where the production code, the public code is hitting some aspects of a column, of one of your tables in your database, but the feature flag 2% is hitting another.

And so there could be an additional migration at the end of that process to allow you to clean up that data once everybody gets that new feature. So now that we've deployed and we're in production, how do we limit the new code path? Again, this is through feature flags. So what are those? It's a fancy way of saying that we protect some new aspect of the code from executing with conditional logic, unless you're a user who's part of that limited release group.

So again, those could be folks that either are assigned as beta testers or have opted into beta features. These feature flags can also act as a safety valve, allowing us to fully turn off a code path if it's not working well, or if no one's using it, or if it has bugs itself or is causing bugs in the current public release.

Well, how would we know if there are any issues in production? Well, by monitoring, alerting and analyzing our production code and environment so that it will alert us when errors are caught surface, slow queries for analysis and debugging, and it'll have a server- load monitoring with alerts on memory usage or CPU overload as well.

There's lots of great tools that will show you bottlenecks in your system to help optimize your code and infrastru. Speaking of infrastructure, we want auto-scaling infrastructure with a containerized and ephemeral set of environments. We can choose or build a deployment platform that will automatically increase the number of nodes in our environments, horizontal scaling pool, on demand.

We want to be able to have it respond to things like requests per minute, or RPM, as well as memory and CPU utilization. A lot of teams only do one of the four loops. A lot of teams only do full-scale delivery. But having a solid foundation for continuous delivery is what makes him product experimentation possible.

So now that we've shipped the feature, we're going to take a look at how it's impacting our metric. Sure enough, it's actually working. We proved it in the pilot that it had impact. We scaled it up and sure enough, it's having an amazing impact to the majority of our users. This is great because customers are seeing success.

A lot more customers are getting past this point, which moves them further into the journey where new areas for improvement can be identified. That's okay because our team's up for the challenge. We're continuing to deliver a lot of value for our customers. And once we identify the next challenge, we can start the process over and figure out how to serve them best next.


New Feature Testing - Prototyping with Cross-Functional Collaboration

You’ve got plenty of ideas. But it’s critical that you proceed with caution. To scale your product rapidly and successfully, you’ve got to figure out how to validate new ideas quickly. Then you will know which features are worth pursuing and when.

Building a Successful Software Development Team

Learn best practices tech leads and software development managers use to build healthy and happy teams. Jon, CTO & Partner, and Tim, Development Lead at Headway, share their real-world experience and advice.

Level Up Your Development Team with Better Agile Retrospectives

One of the best parts about creating great software is collaborating with others. It's also one of the hardest things to get right. Learn how to enhance collaboration and communication across your team.