Batching and Caching With Dataloader

by Leonardo Maldonado

December 31, 2021 Productivity 0 Comments

In this article, we’re going to cover what Dataloader is and how it can help us with database requests and reduce our database costs.

Databases are a pain point in modern applications because fetching resources on databases can quickly become complex. Data is stored in a database to be consumed later. Achieving a nice way of fetching and storing data inside a database requires a lot of knowledge and hard work.

Database optimization is something that developers don’t pay attention to when they’re starting to build their applications. Especially when building an MVP, database optimization can be unnoticed and become a huge pain point in the future. Database requests cost money—meaning it can get expensive over time.

An application that wants to scale to millions of users needs to take care of database requests and the way the data is stored. There are plenty of alternatives and patterns that can be followed to minimize unnecessary costs related to the database and help save some money.

One of the areas that can be improved in modern databases is how the requests are being sent to the database. Reducing the number of requests can improve the performance of the application.

In this article, we’re going to cover what Dataloader is and how it can help us with database requests and reduce our database costs. First, we’re going to understand the N+1 problem and how Dataloader solves it in an elegant way to help us reduce unnecessary requests.

What Is the N+1 Query Problem?

The N+1 query problem is caused when you need to make N+1 queries to the database. N stands for the number of items.

This problem usually occurs when we want to fetch data from our database and we loop through the results. It causes a lot of unnecessary round-trips to our database because we’re making a new request every time.

At the end of the operation, it results in N requests for each item (N) and the original query (+1).

This is how it works:

Imagine that you have a table called Posts with 100 items inside it.
You want to fetch all the points in a single request.
After you fetch all the posts, for each post, you want to return the author.
You map over the results and for each post you make a new request to your database.
It results in 100 unnecessary requests to your database, plus the first request for fetching all the posts.

Making a lot of unnecessary requests to our database can make our application slower. It is pretty easy to naively write our database queries and not even notice that you have this problem.

What Is Dataloader?

Dataloader is a generic utility library that can be used on our application’s data fetching layer to reduce requests to the database via batching and caching. It provides a simplified API to access remote data sources and reduce unnecessary round-trips.

Dataloader is not something particular to Node.js and JavaScript applications—it can be used with any other technology and in different situations. There are currently a ton of implementations in different languages.

One of the most common uses of Dataloader is in GraphQL services. It combines the batching and caching concepts with the core concepts of GraphQL and helps to create faster and more reliable modern APIs.

Batching With Dataloader

Batching is the primary job of Dataloader. It creates our loader by providing us a batch loading function.

import Dataloader from "dataloader";
const postLoader = new DataLoader(batchPostFn)

It’s a function that receives an array of keys and returns a promise, which resolves to an array of values.

After that, we can load our values using the loader that we just created. Dataloader will coalesce all individual loads and call our batch function with all requested keys.

const post = await postLoader.load(1);
const postAuthor = await postLoader.load(post.author);

In this code, you see what we discussed: The batch function accepts an array of keys and returns a promise, which will resolve to an array of values. The first point to pay attention here is that the array of values must be the same length as the array of keys. Another point is that each index in the array of values must correspond to the same index in the array of keys.

import Dataloader from "dataloader";
async function batchPostFn(keys) {
  const results = await db.fetchAllKeys(keys);
  return keys.map(key => results[key] || new Error(`No result for ${key}`))
};
const postLoader = new DataLoader(batchPostFn);

With this simple configuration, we can reduce our unnecessary round-trips to the database and make our database requests more efficient. We would have ended up making a lot of requests to our database, and with a few lines of code we reduced it to only two requests.

Caching With Dataloader

Dataloader provides a memoization cache for all loads that occur in a single request to your application.

After the load function is called twice, Dataloader does in-memory caching and caches the resulting value to reduce redundancy. The data will only be deleted when the data is garbage-collected.

Some developers might think that Dataloader can replace some shared application-level cache such as Redis. But the Dataloader GitHub clarifies:

Dataloader is first and foremost a data loading mechanism, and its cache only serves the purpose of not repeatedly loading the same data in the context of a single request to your Application.

The fact is that Dataloader does not replace Redis or any other application-level cache. Redis is a key-value store that’s used for caching and many other situations.

Getting Started With Dataloader

To get started with Dataloader, we need to install the package:

yarn add dataloader

Now, let’s imagine that we have a simple GraphQL schema, like the following:

type Post {
  id: ID!
  name: String!
  description: String!
  body: String!
  author: Author!
  comments: [User!]
}

type Author {
  id: ID!
  firstName: String!
  lastName: String!
  posts: [Post!]
}

type Comment {
  id: ID!
  text: String!
  user: User!
}

type User {
  id: ID!
  firstName: String!
  lastName: String!
}

Now, we need to create our Dataloader instance. We’re going to create a Dataloader instance for our Post type.

import Dataloader from "dataloader";

async function batchPostFn(keys) {
  const results = await db.fetchAllKeys(keys);
  return keys.map(key => results[key] || new Error(`No result for ${key}`))
};

const postLoader = new DataLoader(batchPostFn);

A good alternative here for making use of our loader without having to import it every time would put it in our GraphQL context, like this:

const graphql = async (req: Request, res: Response) => {
  return {
    schema,
    context: {
      user,
      req,
      postLoader,
    },
  };
};

Now, we can use it in our GraphQL resolvers.

const resolvers = {
  Query: {
    post: (parent, args, context, info) => context.postLoader.load(args.id),
    ...
  }
};

As soon as you start to think about performance, your application will become better and more reliable. It’s very easy to get started with Dataloader and create loaders of all types in your GraphQL API. It will definitely help you to reduce costs and make your GraphQL API more performant.

Conclusion

A naive approach for fetching resources from the database might be expensive over time. Dataloader helps us to reduce our costs and save unnecessary round-trips to our database by batching and caching.

GraphQL

About the Author

Leonardo Maldonado

Leonardo is a full-stack developer, working with everything React-related, and loves to write about React and GraphQL to help developers. He also created the 33 JavaScript Concepts.

Comments

Comments are disabled in preview mode.

All articles

Topics

Latest Stories
in Your Inbox

Subscribe to be the first to get our expert-written articles and tutorials for developers!

All fields are required

Country/Territory

Blog

Product Bundles

DevCraft

Web

Mobile

Document Management

Desktop

Reporting & Mocking

Automated Testing

CMS

UI/UX Tools

Debugging

Free Tools