Ingesting 1M Inserts per Minute to Help Save Devs’ Resources

This is an installment of our “Community Member Spotlight” series, in which we invite our customers to share their work, spotlight their success, and inspire others with new ways to use technology to solve problems.

In this edition, Adam McCrea, founder and developer of Judoscale, explains how he helps other developers save on costs using Timescale. Handling a million inserts per minute with frequent, automatic data retention policies and data rollups, Timescale powers Judoscale’s dashboards to provide developers with real-time metrics on their server resources.

About Judoscale

Judoscale is a tool that helps engineering teams manage their server resources. More specifically, we automatically scale resources by adding and removing web and worker containers as needed based on traffic metrics and other factors. So, it's a way for teams to manage their costs to avoid paying for resources they don't need and also handle variable traffic so they don't have downtime with a traffic spike.

Adam McCrea, founder and developer of Judoscale (white man with silver hair and blue eyes smiling)

For most of the time since its foundation in 2016, Judoscale has been just me. My role is pretty much everything: I am the software developer, the marketer, customer support, and everything in between. In late April, the team doubled in size, which was exciting, with the addition of a full-time developer.

Most of our customers are small to mid-sized development teams—as small as one-person shops like mine—upwards to teams of 20 or more engineers, so mid-sized companies, but across all industries. We have e-commerce customers, customers building developer tools like we do, and customers in all kinds of industries I don't know much about.

Judoscale was originally built as an add-on for Heroku, a web hosting platform. Heroku has a marketplace where you can put add-ons, and it was a really easy way for Judoscale to get some initial exposure to an audience. It was kind of tightly focused on what I was building initially for a very specific audience—again, not a specific audience in terms of business domain but in terms of technology, their hosting technology.

Managing Server Resources in Heroku

Judoscale’s dashboard with graphs for server resources: scale, throughput, and queue time — Judoscale's dashboard

The majority of our customers are still on the Heroku hosting platform, and they access Judoscale through Heroku. On Heroku's portal, they can click a link to jump to Judoscale's dashboard, where they can see all of their applications using Judoscale, including how many servers they're running and charts with the metrics of those servers.

We're using D3 for the data visualization. D3 is a low-level tool that has allowed us to build a completely custom user experience. That level of customization has come at the cost of maintenance and complexity. This area of our code is ripe for refactoring, and I’m not sure if we’ll choose D3 the next time around.

Ninety-five percent of our application is Ruby on Rails. We ingest metrics through our Ruby on Rails API, and our autoscaling algorithm is also written in Ruby.

We have another piece that consumes log metrics from our customers, who have the option to have all of their log data from Heroku come to our application. We can parse some metrics out of that, which can be a lot of data coming at us. So, that particular piece we wrote in a different language called Crystal, which is written like Ruby, but it's designed to be much more performant because that was a piece that Ruby wasn't scaling very well for us.

Choosing (and Using!) TimescaleDB

We collect metrics to scale our customers' applications. They have a piece of code that lives within their app, an open-source adapter library that we've written that sends metrics to us and connects to our API.

So, imagine we have hundreds of customers running and many servers constantly sending metrics to us. It comes into our API, and we basically take those metrics, put them into Timescale, and then, every 10 seconds, aggregate that data with Timescale so that we can do two things. One is driving our autoscaling algorithm to know whether we need to scale them up or down. Then, it also drives our user interface so that our customers can look and see how their metrics are going and what scaling has occurred.

When we started using Timescale in 2022, we were only using it for those time-series metrics. We were running two PostgreSQL databases to migrate in pieces at that time. Once everything worked great on Timescale, we moved the rest of our operational data to Timescale. So now everything lives in Timescale, our only relational data source. We still do use Redis for some things.

“We started feeling some pain with that [homegrown] approach as our business was scaling.”

In fact, before using Timescale, we had kind of a homegrown time-series solution with Redis. We weren't using the Redis time-series module. Metrics would come in, we'd throw them into Redis, and then we'd have our own asynchronous jobs that would aggregate that data and then put the aggregate data into our PostgreSQL database. We had our own system for partitioning those tables in PostgreSQL and dropping old partitions and things like that—all the kinds of things that Timescale does for us, which is what was really appealing about Timescale.

✨

Editor’s Note: Learn how Timescale automatically partitions your PostgreSQL tables.

We started feeling some pain with that approach as our business was scaling. We would run into issues where we would fill up Redis or the jobs doing the aggregation were taking too long. There were too many pieces to manage, and it just wasn't scaling well, so we started looking for time-series solutions that were designed to do the kind of thing we were doing instead of a homegrown solution.

“It just seemed like Timescale was going to fit in really well with the way we were already doing things, and it was going to simplify a lot of our architecture and let us remove a bunch of pieces—and it did. It did exactly what we were hoping for.”

And that's how I found Timescale: looking at what other time-series solutions were out there. I saw that it was built on top of PostgreSQL, doing some of the things we were already doing with table partitioning and dropping old partitions but doing that automatically for us, handling the aggregation for us. It just seemed like Timescale would fit in really well with the way we were already doing things, and it would simplify a lot of our architecture and let us remove a bunch of pieces—and it did. It did exactly what we were hoping for.

For us, the move to Timescale was less about performance and more about stability and reliability. Before moving to Timescale, we had frequent issues with things not working and data stores filling up. As the sole developer on the project, I was getting alerts in the middle of the night, maybe once a week or at least a few times a month, and I would have to go in and fix things. With the move to Timescale, that's extremely rare.

“The big things [with Timescale] were stability and reliability. The other big thing was just being able to move faster from a development standpoint.”

If something is not working, it's rarely related to the data pipeline that was causing us all those issues before. The big things were stability and reliability. The other big thing was just being able to move faster from a development standpoint. Before, we just had a lot more moving pieces. It was more complex, and the extra complexity just slowed us down. The whole data pipeline is a lot simpler now, and it lets us move a lot faster.

As a one-developer company, the peace of managed services is also crucial: I would never even consider self-hosting. Even before Timescale, we used managed PostgreSQL on Heroku, Redis, etc. I definitely have zero interest in running my own database servers or running my own servers of any kind.

That's one of the reasons we built on top of Heroku: Heroku is managed web hosting, the way that Timescale is managed database hosting. It removes a lot of that complexity and responsibility so that we can focus on building a product. The Timescale for AWS cloud product, in particular, was really appealing because it was all managed in a single package, making that decision a lot easier.

Running a Data Retention Policy Every 10 Minutes

“Our API, which handles data ingestion, usually runs between 1,000 and 1,500 requests per second. That translates to close to a million inserts to Timescale every minute.”

The feature that is really huge for us is hypertables. In particular, how partitioning is automatic and how dropping old partitions can be automated through data retention policies. That’s really big. Then, continuous aggregates: having the aggregate data calculated automatically and querying it as real-time aggregates has simplified our architecture and code. It's really reduced a lot of what we have to do.

Our API, which handles data ingestion, usually runs between 1,000 and 1,500 requests per second. So, those are web requests coming into us. That translates to close to a million inserts to Timescale every minute. We have four hypertables, but I would say one of them, in particular, handles about 80-90 % of that volume.

✨

Editor’s Note: Learn what data retention policies are and how you can manage your data by using them.

To be honest, we haven’t used many of the more advanced Timescale features or compression because we’re a bit of an unusual use case for Timescale. We really only care about the most recent data, as our autoscaling algorithm only looks at data within the past few minutes. And then, in terms of what our customers see in our dashboard, they see charts for the past 24 hours.

The automatic data retention policy runs every 10 minutes, and we only retain the most recent hour of data, dropping everything else. For the data rollups, we retain two days of aggregations, and we run that retention every day. Once it’s aggregated, we don’t really care about the raw data because we don’t query it anymore.

💻

Check this GitHub repo for queries from Judoscale’s dashboard and on how they generate their continuous aggregates.

This means we don’t need to keep old data and use compression. We're actually running our continuous aggregates every 10 seconds. We get in all this raw data, and then we get 10-second aggregates, which seems fairly unusual. So we had a few calls early on with some folks at Timescale to help us through some ways that we were doing things in a non-optimal way, which was really helpful. I'm not sure it would have been a successful migration without that.

“Timescale has been a crucial tool for us. It’s really simplified things for us, and it’s been a joy to use.”

Extending Beyond Heroku

Right now, we're in a phase of expanding beyond a Heroku add-on. So, at the end of last year, we launched an integration with AWS to provide the same autoscaling experience to customers hosting their applications on AWS.

This year is mostly focused on getting that service off the ground and getting more customers onboarded into it. It's still in its infancy. We've got a few customers on it, but Heroku still very much dominates our customer base.

Advice & Resources

I would just like to restate that Timescale has been a crucial tool for us. It’s really simplified things for us, and it’s been a joy to use. And I really have been impressed with the documentation. Early on, when we were getting started, I watched a ton of the videos that you all had made and the way that you've explained how retention works, how continuous aggregates work, and how the hypertables work with a lot of visuals. I just really appreciated all that because it really helped me build a mental model of how Timescale worked and how we could make the most of it. That made it really easy to get going with it.

We’d like to thank Adam McCrea for sharing Judoscale’s story and giving us a behind-the-scenes peek into how they use Timescale to help developers manage their server resources and reduce costs.

We’re always keen to feature new community projects and stories on our blog. If you have a story or project you’d like to share, reach out on Slack (@Ana Tavares), and we’ll go from there.

Ingest and query in milliseconds, even at terabyte scale.

This post was written by