DEV Community

Ryan
Ryan

Posted on • Updated on • Originally published at ryanc.co

Faster Eloquent Chunking

As your applications scale, processing large amounts of database records with Laravel Eloquent can become increasingly difficult. Resulting in out of memory exceptions and overall slowing down your application. Why is that?

When fetching results from the database you are in turn pulling that data into memory. Take this snippet of code for instance

Post::all()->each(function ($post) {
    // ...
});
Enter fullscreen mode Exit fullscreen mode

Which results in the following query, loading all records in the posts table into memory

select * from posts;
Enter fullscreen mode Exit fullscreen mode

Typically for tables with a small number of records, this is absolutely acceptable. However, as you accumulate tens of thousands of posts you will eventually begin hitting memory resource constraints of your webserver.

Chunking

A common approach in Laravel is to use Eloquent's (via BuildsQuery) chunk() method which fetches a fixed amount of records breaking a larger set into more consumable chunks.

Post::chunk(1000, function ($post) {
    // ...
});
Enter fullscreen mode Exit fullscreen mode

While this might seem fine there are both improvements and gotchas to be aware of.

First, imagine the following scenario: you are fetching Post records from the database to update an attribute that you are also using in a where clause

Post::where('published_at', '<', now())->chunk(1000, function ($post) {
    $post->update('published_at', now());
});
Enter fullscreen mode Exit fullscreen mode

Although contrived, this exemplifies a very real problem where such a query would result in an endless loop as the published_at attribute will always be less than now() at the time of the queries next execution (assuming accuracy to the second as with MySQL's timestamp column type or similar).

Second, there is the matter of the queries performance and its impact on the database server. The above code would result in a query similar to the following

select * from posts order by posts.id asc limit 1000 offset 9000
Enter fullscreen mode Exit fullscreen mode

MySQL is unable to go directly to the offset due to deleted records and additional query constraints and, therefore, this query has to effectively select the first 10,000 records to return only the last 1,000 selected. As you can imagine this will not scale well into many-thousands or even millions of rows. This will cause your database server to use unnecessary resources slowing down all other queries from your application.

Chunking... but better!

In order to both prevent the unforeseen gotcha and improve database server performance we can use Eloquent's chunkById method

Post::where('published_at', '<', now())->chunkById(1000, function ($post) {
    $post->update('published_at', now());
});
Enter fullscreen mode Exit fullscreen mode

The above code snippet will result in a query similar to the following

select * from posts where published_at < '2019-09-11 12:00:00' and id > 9000 order by id asc limit 1000
Enter fullscreen mode Exit fullscreen mode

Why is this approach considered "better"?

a) it allows MySQL to completely skip the first 9000 (assumingly sequential) records
b) we will no longer be re-selecting records which we have already updated due to the id constraint in our where clause

Bonus - How?! 🤔

Diving into the chunkyById method of the BuildsQueries trait we see that the id of the last record fetched (remember, we're ordering by id in ascending order) is stored and used as a parameter in the next query to be run.

Top comments (5)

Collapse
 
kidipridi profile image
Kidi Pridi

Sure interesting post, but I'm against using eloquent when mass updating records as it updates one by one. I prefer to use update() on query builder, something like :

DB::table('posts')->where('published_at', '<', now())->update(['published_at' => now()]);
Enter fullscreen mode Exit fullscreen mode

It will result query like this :

update `posts` set `published_at` = 2019-09-25 07:09:55 where `published_at` < 2019-09-25 07:09:55
Enter fullscreen mode Exit fullscreen mode
Collapse
 
mennowolvers profile image
Menno Wolvers • Edited

This is indeed faster but be careful that your Observers wont be called with this method.

Collapse
 
ajmaske profile image
Anthony Maske

Very nice Ryan! Hope things are going well for you.

Collapse
 
cbloss profile image
cbloss

The chunkById saved me! Was having issues you mentioned by just using chunk. Thanks!

Collapse
 
mhfereydouni profile image
Mohammadhossein Fereydouni

That was very useful. Tanks alot.