Developers often fear database changes because a mistake by anyone on your team can lead to a major outage and even data loss. The stakes are higher when changes are not backwards compatible, cannot be rolled back, or impact system performance. This can cause a lack of confidence and slow your team velocity. As a result, database changes are a common failure point in agile and DevOps.
Databases are often created created manually and too often evolve through manual changes, informal process, and even testing in production. Manual changes often lack documentation and are harder to review, test, and coordinate with software releases. This makes your system more fragile with a higher risk of failure.
The solution is to include your database in your source control and CI/CD pipeline. This lets your team document each change, follow the code review process, test it thoroughly before release, make rollbacks easier, and coordinate with software releases.
Let’s look at an example of how you can include your database in your CI/CD process and how to push a non-backwards-compatible database change successfully. We'll also look at testing your changes, progressive deployments, dealing with rollbacks, issues to look out for, and a few helpful tools.
CI/CD is a cornerstone of modern development and DevOps.
CI—or Continuous Integration—is the practice of merging all working developer code into a shared repository throughout the day. Its purpose is to prevent integration problems by integrating often and early. Commonly, this integration kicks off an automated build and test.
CD—or Continuous Delivery—is the practice of building, testing, and releasing software in short cycles, with the aim of ensuring that a working version of the software can be released at any time.
There are several key requirements to having your database ready for CI/CD. First, the database must be reproducible from scratch using one or more SQL scripts. This means that in addition to a script that creates the initial version of your database, you must also maintain scripts that make all required schema updates to your database.
When you create these scripts, you have two options:
To learn more, check out this excellent article on state-based versus migration-based database updates.
The second requirement for CI/CD is that the database schema (meaning, those scripts we just mentioned), just like your source code, must live in source control. You must treat your database schema changes as a controlled process just as you do with code.
Third, always back up before performing any database migrations. If you're working with a live production database, consider a Postgres follower database for your migration or upgrade.
Lastly, changes that involve removing a database object, such as deleting a column as shown below, can be more difficult to deal with due to the loss of data. Many organizations develop strategies to deal with this, such as only allowing additive changes (e.g. adding a column), or having a team of DBAs that deals with such changes.
Perhaps the best process for database changes and database CI/CD is ensuring you have a collaborative effort between DevOps and DBAs. Make sure your DBAs are part of the code review cycle; they can help to identify issues that only they may know about. DBAs have knowledge of the databases in each specific environment, including database specific dependencies such as ETL load jobs, database maintenance tasks, and more.
Be sure to consult a database SME in setting up your database for CI/CD, and in any migration process, when possible. Be sure to also follow sensible DevOps processes, such as test your changes in a test environment, performing backups, mitigating risks, being prepared for rollbacks, and so on.
When you create or update these scripts, and push them to source control, your CI tool (such as Jenkins or Heroku CI) will pull the changes and then:
In many cases, when you're making a simple schema addition with bidirectionally compatible code, then you can push code and database changes at the same time. This shouldn't be an issue, as rollbacks in our case will be easy and predictable. This is often true when we are dealing with microservices with simple database components.
However, in many scenarios, serious problems can happen with this simplistic approach:
There are several strategies for addressing the above issues. Some popular solutions include:
Let’s run through an example based on the the migration scripting option as explained above. Note that some frameworks (Rails, Django, ORM tools, and so on) abstract out or handle schema creation and migration for you. While the details may differ according to the framework you are using, the below example should still help you to understand these core concepts. For example, you may have a schema configuration file to include in your CI/CD process.
For our example, we'll use Node.js, Postgres, and GitHub. We'll also use Heroku because it provides convenient tools including Heroku CI with deploy scripts for CI/CD, and easy Postgres rollbacks in case we make a mistake. If you need help deploying Node.js and Postgres on Heroku, here’s a quick walk-through.
Here's the pertinent code for our example. We're going to create a simple database with a single table, and a Node.js file that writes to that database table on load.
Database creation SQL (we have just one simple table):
CREATE TABLE users (
id integer PRIMARY KEY,
firstname varchar(40) NOT NULL,
lastname varchar(40) NOT NULL,
enrolled char(1) NOT NULL,
created_at date NOT NULL
);
Node.js
const result = await client.query('INSERT INTO users
(id,firstname,lastname,enrolled,created_at)
values ($1,$2,$3,$4,$5) ',[1,'Becky','Smith','y',new Date()]);
Once these files are checked into GitHub and our repository is attached to a Heroku app, we can enable the Heroku CI tool on the Heroku dashboard:
The real work is done by the Heroku Procfile and the Heroku release phase. Using those, we can tell the Heroku CI tool to run a database migration SQL file any time a new release is created (in other words, a successful compile). Here is the release line we need to include in the Heroku Procfile:
release: bash ./release-tasks.sh
The content of the release-tasks file includes a list of SQL scripts to run. That list is updated with each release to include the needed schema modifications. For this very simple example, it will point to just one script:
psql -h <hostname> -d <database> -U <user> -w -f database/migrate.sql
(The database password can be supplied as a Heroku environment variable.)
Typically, as we are using the migration-based strategy, we would add additional migration scripts for each set of changes. For a more robust solution, we could use a tool such as Liquibase, Alembic or Flyway. These tools add version control to your database, both generating the necessary change scripts between releases, and giving you the ability to easily roll back changes. For example, Flyaway creates scripts that allow you to migrate from any version of your database (including an empty database) to the latest version of the schema.
To kick off the CI tool, we make two changes: drop a required column, and change the JavaScript to no longer reference that column. First, we update the SQL code in Node.js, taking out the column:
const result = await client.query('INSERT INTO users
(id,firstname,lastname,created_at)
values ($1,$2,$3,$4) ',[2,'Becky','Smith',new Date()]);
Next, we create a migrate.sql file (referenced in the Procfile above) to alter the table and remove the column:
ALTER TABLE users DROP COLUMN enrolled;
Now, we commit the code change and SQL file, and watch the CI magic. First, the integration tests run. If you are using a common testing framework, the Heroku CI tool probably works with your test suite.
And now the CI tool creates a new release and deploys the app, which kicks off the migrate.sql file. (See the middle of the image below.)
We can check to see that the column was removed by inspecting the database through the Heroku CLI tool:
It worked! There is no longer a column named 'enrolled'. Our CI tool ran our script and deleted the column.
Some tools, like Liquibase, keep a detailed list of database changes. These tools allow you to easily see the last set of changes in cases like the above.
Now, any time that code or an updated migrate.sql is committed in the future, the CI tool will kick off the tests. If the tests pass, this creates a new release and pushes it to staging. When there is a new release, the migrate.sql file runs against the staging database.
We've taken a simple route here for demonstration purposes, but could have made this process more robust. For instance, when moving a new release to staging, we could wipe out the old version of the database, create a new one from scratch running the original creation script plus all migration scripts, and then populate the database with any reference data all through the Procfile and release phase. Also note that for simplicity sake, we are not running this migration with transactions in progress. In a real-world scenario, Heroku recommends using an advisory lock to prevent concurrent migrations.
Even with the best planning and forethought, there will be times when you need to roll back your database. There are many approaches to rolling back failed deployments.
Be aware that all these solutions come with their own challenges, such as potential loss of new data (restoring a backup or redeploying) and introducing new bugs.
Database changes and migrations can be scary, and can cause serious mistrust. However, if you place your database under CI/CD controls, you can not only confidently migrate your changes, but also move towards a better agile and DevOps experience. This is can be as simple as using source control for your database schema, having a good process in place with your DevOps and DBA teams, and using your existing CI tools to test and migrate your databases. Once you establish and train your team on the new process, future changes will be smoother and more automatic than your old manual process.