Automating Postgres Indexing with Dexter & HypoPG
A database index is a function of any database engine which allows it to find the records you are looking for without having to check every row. You manually build them, but once they’re there, your database will make the decision whether or not to use it.
Database indexes are important; they can make or break the performance of whatever you’re building. They’re also easy to overlook in development, as problems often don’t present themselves until you start loading up the database with data.
What is a database index in SQL?
A database index is a function of any database engine which allows it to find the records you are looking for without having to check every row. You manually build them, but once they’re there, your database will make the decision whether or not to use it.
When are database indexes useful?
Database indexes are a small concise lookup for your tables. They’re similar to the table of contents in a book; if you add an index, then instead of having to flip through the pages to find what you’re looking for, you get a page number and can go straight to the right place. Postgres takes this one step further and can use the index for all sorts of queries (joining tables, doing math, etc.)
Why I don't just index everything?
People have written books about the performance of indexes, and I’m not going to go into it here, beyond to say: each index does require maintenance. Any time you insert/update/delete a record it has to be updated. They take up disk space, and the database tries to keep them in memory, so they’ll enlarge and slow down your backups. Long story short, while you want to have the right indexes, you don’t want to index every column.
How can we improve database indexing?
While an experienced developer can often identify where indexes are needed, it’d be much better if we could automate it. Luckily, we don’t have to do a ton of work to pull this off; there’s a series of tools we can use to make this easy.
Automate indexing for an SQL database
While our usage of these tools actually just lives at the top layer, it’s important to start at the bottom to understand the pieces.
First up: Postgres
Whether this is the right database choice for you is a whole different blog post, but it’s essential for this post, as the tooling is Postgres specific.
Next: HypoPG
HypoPG lets you create hypothetical indexes and run queries against it, to determine whether they help your query. You don’t need to understand this, but you do have to have it installed into whatever Postgres instance you’re going to run this on.
Finally: Dexter
(Technically, ruby is required to install this, but that’s a dependency, so I don’t count it). Dexter is the star of this show and does all the hard work. You feed your database logs into it, and it finds slow queries and works to figure out what indexes might help. It uses HypoPG to test the query to ensure that it actually improves things.
See it in action
Here’s an example of what Dexter can tell me. I dropped all the indexes from an app just to get a better example; then I fire up the app, and click around. It recommends indexes which it thinks will improve things.
You can also add a --create flag, and it will create the indexes for you:
Of course, that doesn’t play the best with our rails app, so instead, I convert the recommendations to a migration:
To write this post, I dropped all the indexes from my app. This left a fairly simple page completing in 4,954ms. With the indexes added, it finished in 1,151ms.
Things to watch out for:
Installation challenges
Installing this was a bit of a challenge. Once it’s installed, you have to add the extension to the database you want to analyze. This impacts the schema, so if you don’t remove it when you’re done, it gets added to your schema files by Rails anytime you migrate. This causes issues if you don’t have HypoPG installed everywhere you’re running the app (including your CI server)
Need access to database and logs
In order for this to work, you need access to both the database to test against and logs of actual usage. If you’re using a hosted database, this can pose issues. The documentation does provide instructions on how to work around it, but your mileage may vary.
Non-optimal recommendations
Some of the indexes it recommends may not actually make sense. It also doesn’t necessarily have situational awareness so there may be a composite index that would work better than what it is recommending.
Takeaways
Dexter is a great tool to expose indexes you might be missing. I wouldn’t bother with it for daily use, but it’s certainly something that can find some performance gains if you’re not hyper-vigilant about adding indexes on your own. It would also be a great tool in the arsenal when taking over a project to make sure that things were running smoothly. Like so many tools, it should be used to make recommendations, but as indexes also have consequences, I’d be wary of using it in automated mode, rather than manually adding its suggestions.
Resources to learn more about database indexes