This blog was originally published in March 2021 and was updated in July 2023.

Good intentions can sometimes end up with bad results.  Adding indexes boosts performance until it doesn’t. Avoid over-indexing.

The difference between your application being fast, responsive, and scaling properly depends on how you use indexes in the database. MongoDB is no different; its performance (and the overall performance of your application) heavily depends on getting the proper amount of indexes on the right things. A simple index or two can speed up getting data from MongoDB a million-fold for million-records tables. But at the same time, having too many indexes on a large collection can lead to massive slowdowns in overall performance. You need to get your indexes just right.

For this blog, we are going first to provide an introduction to MongoDB indexes and then dive deeper into having too many indexes and how to help you find both duplicate and unused indexes. If you want to find out if you need additional indexes or if your query is using an index, I would suggest reading previous Percona articles on query tuning (Part 1 & Part 2 of that series).

So, indexes are very good for getting faster queries. How many indexes do I need to create on a collection? What are the best practices for the indexes? How do I find which indexes are being used or not? What if I have duplicated indexes? All of this and more will be covered in the blog post below.

An Introduction to MongoDB Indexes

MongoDB indexes are crucial in optimizing database performance, as they improve the speed of data retrieval operations in a MongoDB database. 

They create a reference to the location of data within a collection based on the indexed field(s). When a query involves the indexed field, MongoDB can use the index to rapidly pinpoint the relevant documents, resulting in faster query execution times.

By expediting data retrieval through efficient query execution, MongoDB indexes enhance the performance of your database

Why are MongoDB Indexes Important?

MongoDB indexes play a pivotal role in enhancing query performance and improving overall database efficiency. They enable MongoDB to retrieve data by providing a roadmap for the database to locate specific data quickly.

When a query involves the indexed fields, MongoDB can use the index to navigate directly to the relevant documents, bypassing the need for a full collection scan. This process is akin to using an index in a book to find specific information instead of reading the entire book.

Especially helpful in collections with a large number of documents, indexes are instrumental in maintaining acceptable query performance. Without indexes, querying such collections could become prohibitively slow due to the need to scan through a massive amount of data.

Indexes not only speed up queries but also minimize resource usage. With indexes, MongoDB performs fewer disk I/O operations and consumes less CPU, leading to improved server performance and reduced response times.

By creating organized data structures that enable rapid data retrieval and efficient query execution, indexes empower MongoDB to locate and retrieve data that matches specific query conditions quickly. This capability is essential for achieving responsive applications, optimizing user experiences, and supporting applications dealing with significant data loads.

When Should You Use Indexes in MongoDB?

Indexes prove invaluable in various MongoDB scenarios and use cases, particularly when optimizing query performance and data retrieval is paramount. They are particularly beneficial in:

Accelerating Queries: Indexes dramatically enhance query execution for frequently accessed fields, leading to quicker data retrieval and overall application responsiveness.

Sorting and Aggregation: Indexes expedite sorting and aggregation tasks, enabling efficient data manipulation for reports, analytics, and visualization.

Range-based Queries: Indexes bolster range-based queries, essential for retrieving data within specific value intervals, such as time ranges or numeric sequences.

Join Operations: In cases of data linking through multiple collections, indexes improve join efficiency, consolidating data from various sources promptly.

Text Searches: For applications requiring full-text search capabilities, text indexes accelerate searching within text fields, facilitating efficient content searches.

Geospatial Data: Indexes are vital for geospatial applications, where rapid location-based queries and analyses depend on efficient spatial indexes.

Data Deduplication: Indexes expedite processes involving duplicate detection and data cleanup, enhancing data quality and consistency.

The Different Types of MongoDB Indexes

MongoDB offers various indexing options, each designed to optimize specific data access patterns and query scenarios. Understanding the different types of indexes is essential for harnessing MongoDB’s full potential in terms of query performance, data retrieval, and overall database efficiency. From simple single-field indexes to compound, text, and geospatial indexes, each type plays a crucial role in enhancing the speed and accuracy of data access.

Single Field Indexes

Single field indexes in MongoDB are a fundamental indexing strategy that significantly impacts query performance by expediting data retrieval based on a specific field. These indexes efficiently handle queries involving the indexed field, notably minimizing the necessity for full collection scans. They are especially advantageous for tasks such as filtering, sorting, or searching data based on a single attribute. For example, when queries frequently focus on unique user identifications or timestamps, a single field index can profoundly enhance query execution, ensuring prompt and responsive data access.

Compound Indexes

Compound indexes in MongoDB play a crucial role in optimizing queries that involve multiple fields. These indexes encompass more than one field and allow for enhanced query performance by facilitating the retrieval of data based on various attributes simultaneously. The order of fields in a compound index holds immense importance as it directly affects its efficiency in addressing specific query patterns. This order influences how the index is used; queries that match the order of fields in the index benefit the most. Careful consideration of the order in which fields are placed within a compound index can significantly impact query efficiency and overall database performance.

Multikey Indexes

Multikey indexes in MongoDB are indexing tools that involve arrays and nested documents and are particularly useful when dealing with fields containing arrays or documents that hold multiple values. A multikey index generates separate index entries for each value within an array or nested document, enabling efficient querying and filtering based on these nested values. While multikey indexes offer benefits like enhanced query performance for array-based searches, it’s crucial to consider potential downsides, such as increased index size and write operations’ impact.

Text Indexes

In MongoDB, these enable full-text search functionality within the database, making it easier to search for and retrieve relevant information from large collections of text data. Text indexes are particularly beneficial when natural language processing and text-based analysis are essential. Uses include content management systems, social media platforms, and e-commerce sites, where users need to locate specific textual content within datasets.

Geospatial Indexes

Geospatial indexes in MongoDB are designed to facilitate the efficient retrieval of location-based information, like proximity searches, polygon intersections, and distance-based analyses. They are particularly valuable in applications that involve geographic data, such as mapping and navigation systems, location-based services, and real estate platforms, providing responsive and accurate information and enhancing user experiences.

Hashed Indexes

Hashed indexes in MongoDB are specialized indexing tools for distributing data uniformly across a collection. Unlike other index types, hashed indexes use a hashing algorithm to map field values to index keys, ensuring even distribution. This approach is particularly beneficial for scenarios where write-intensive workloads are prevalent. In such cases, hashed indexes can mitigate issues related to index contention and hotspots, distributing write operations across the index evenly. Hashed indexes excel when the primary concern is maintaining consistent performance in high-write environments, such as in applications with rapidly changing data.

Want to learn more about MongoDB indexing types? Check out this blog from Percona.

Impact of Indexes on Performance

MongoDB indexes have a profound impact on query performance and execution by enabling efficient data retrieval, allowing MongoDB to quickly locate and extract data that aligns with specific query conditions. However, the misuse of indexes can lead to performance degradation. Over-indexing, where excessive indexes are created, can burden write operations and increase storage requirements, while under-indexing can result in slow query execution.

To balance read and write performance when using indexes, consistent evaluation and periodic adjustment of indexes are necessary.

Elevate your database management. Choose Percona Managed Services for reliable, around-the-clock support.

 

Get started today!

 

Common MongoDB Indexing Performance Problems

After analyzing a lot of different MongoDB environments I can provide the following list summarizing the typical errors I have seen:

  • Not creating indexes at all, other than the primary key _id created by design.
    • I’m not joking – I have seen databases without any user-created indexes, which had owners surprised the server was overloaded and/or the queries were very slow.
  • Over-indexing the collection.
    • Some developers usually create a lot of indexes without a specific reason or just for testing a query. Then they forget to drop them.
    • In some cases, the size of all the indexes was larger than the data. This is not good; indexes should be as small as possible to be really effective.

I’m not considering the first case. I’m going to discuss the second one instead.

How Many Indexes Do You Need in a Collection?

It depends – that’s the right answer. Basically, it depends on your application workload. You should consider the following rules when indexing a collection:

  • Create as many indexes as possible for your application.
  • Don’t create a lot of indexes.

What? These rules are stating the opposite thing! Well, we can summarize in just one simple rule:

  • You need to create all the indexes your application really needs for solving the most frequent queries. Not one more, not one less.

That’s it.

Pros and Cons of MongoDB Indexing

The big advantage of the indexes is that they permit the queries, updates, and deletes to run as fast as possible if they are used. (Every update or delete also needs to do a lookup step first). More indexes in a collection can benefit several queries.

Unfortunately, the indexes require some extra work for MongoDB. Any time your run a write, all the indexes must be updated. The new values are stored or dropped into the B-Tree structure, some splitting or merging is needed, and this requires some time.

The main problem is that “more indexes you have in a collection, the slower all the writes will be”.

A very large collection with just 10 or 15 indexes can have a significant performance loss for the writes. Also, remember that indexes have to be copied into the WiredTiger cache. More indexes imply also more pressure for the memory cache. The pressure can then lead to more cache evictions and slowness.

A good example of this is when I was working with a customer a few weeks ago we found 12 extra indexes on a collection they did not need. The collection was around 80GB; the total index size was more than the data size. They had a relevant write load based on several frequent inserts and updates all the time. Cleaning these indexes increased their write queries execution time by 25-30 percent on average. The improvement observed for this real case won’t be the same quantitative amount in other cases, but for sure the fewer indexes you have the faster all the writes will be.

We need to find some kind of balancing: creating more indexes, but not that much. Let’s take a look at some of the pros and cons of MongoDB indexing.

Pros

Improved Query Performance:

MongoDB indexing significantly improves query performance by accelerating the data retrieval process. It achieves this by organizing data structures that map indexed fields to their respective data locations, enabling MongoDB to quickly locate and retrieve precise data that aligns with query criteria.

Faster Sorting and Aggregation:

MongoDB indexing also contributes to faster sorting and aggregation operations because the indexes allow MongoDB to access and process data more efficiently.

Reduced I/O Operations:

Because indexes provide a structured way to access data, it can find requested data without scanning the entire collection. This minimizes the amount of data read from storage, resulting in fewer I/O operations. 

Support for Unique Constraints:

MongoDB indexing supports unique constraints, which ensures the uniqueness of values within indexed fields. By creating these unique indexes, no duplicate values can be inserted into the indexed field. This is especially valuable for maintaining data integrity and preventing data inconsistencies.

Cons

Increased Storage Overhead:

Because indexes require additional storage space to store the index data structures, it can impact the storage requirements of a MongoDB database. It’s important to consider the trade-off between the benefits of the improved query performance that comes from indexing with the potential for additional storage costs.

Write Performance Impact:

While indexes benefit read operations, they can also impact write operations. When you create, update, or delete documents in a collection with indexes, MongoDB must maintain the indexes to reflect the changes. This additional work introduces overhead during write operations. In addition, indexes can become fragmented due to frequent updates, inserts, and deletes, and fragmented indexes may not be as efficient for query performance.

Memory Usage:

Index data is stored in RAM for efficient query processing. And as a dataset grows, more memory is used. Efficient memory management is crucial to prevent excessive usage affecting database performance.

How to Reduce Over-Indexing

Very easy to say: drop all the indexes you don’t need.

There are two things you can do to identify the indexes to get dropped:

  • Check for the duplicates.
  • Check for the unused indexes.

For dropping an index you need to run something like the following:

Find Duplicate Indexes

A duplicate index could be an index with the same exact definition as another index that already exists in the collection. Fortunately, MongoDB is able to check this and it is not permitted to create such an index.

Let’s do a test using a simple collection with no indexes.

MongoDB is then clever enough to avoid the creation of duplicate indexes. But what about the creation of an index that is the left-prefix of an existing index? Let’s test it.

We consider a leftmost-prefix index as a duplicate as well.

To take advantage of a compound index MongoDB doesn’t need to use all the fields of that index, the leftmost prefix is enough. For example an index on (A,B,C) can be used to satisfy the combinations (A), (A,B), (A,B,C) but not (B) or (B,C). As a consequence, if I have two different indexes, one on (A, B, C) and another one on (A, B), the second is a duplicate because the first can be used the same way for solving the query with the combinations (A, B) and (A).

Then, find all duplicate indexes and drop them since they’re useless. Just be aware and check that your application doesn’t use hint() on the indexes you’re going to drop.

In order to avoid manually checking all the collections to discover the duplicates, I provide here a javascript code for that:

Note: this script is just an initial test and could be improved, but it should work in most cases.

Find Unused Indexes

MongoDB maintains internal statistics about index usage. Any time an index is used for solving a query a specific counter is an increment. After running MongoDB for a significant amount of time – days or weeks – the statistics are reliable and we can find out which indexes have been used or not.

For looking at the index stats, MongoDB provides a stage in the aggregation pipeline: $indexStats

Here you can see an example:

The accesses.ops is the number of times the index has been used. In the example you can see the { borough:1 } has been used 312 times, the index { _id } 12 times, and the index { cuisine:1, borough: 1} 0 times. The last one could be dropped.

If the database is running for a long time with millions of queries executed and if an index was not used, most probably it won’t be used even in the future.

Then you should consider dropping the unused indexes in order to improve the writes, reduce the cache pressure, and save disk space as well.

Using the following script you can find out the index statistics for all the collections:

Look for the indexes having “ops”: NumberLong(0)

Limitations and Challenges of MongoDB Indexes

While MongoDB indexes offer significant benefits, they also come with certain limitations and challenges:

Index Size: Many indexes can lead to large storage requirements.

Write Operations: Additional indexes slow down write operations.

Query Planning: Poorly chosen indexes might lead to inefficient query execution plans.

Index Updates: Frequent updates can lead to index fragmentation, impacting performance.

Sharding: In sharded environments, where data is distributed across multiple nodes or clusters, index distribution becomes important. Poorly distributed indexes can result in uneven query loads across shards, leading to suboptimal performance. Improper shard key selection can lead to uneven data distribution.

Index Complexity: Complex queries might require multiple indexes, complicating management.

The trade-off between Indexes and Collection Design: Adding indexes to improve query performance often involves trade-offs. While indexes can accelerate queries, they can also increase storage usage and maintenance overhead. 

Index Selection: Choosing appropriate indexes requires understanding query patterns.

Dynamic Queries: Queries with dynamic conditions might not effectively use indexes.

Balancing the benefits of faster query performance with any potential drawbacks is key to effective index design and management in MongoDB.

Best Practices for Effective Indexing

Indexes play a crucial role in enhancing the performance and efficiency of MongoDB databases, enabling faster data retrieval and improved query performance by ensuring efficient data access. Here are some best practices for designing, implementing, and maintaining indexes in MongoDB:

Identify Query Patterns: Understand the types of queries your application frequently executes so you can determine which fields are frequently used in filtering, sorting, and joining data. 

Compound Indexes: Combine multiple fields into a single compound index to cater to queries that involve multiple filtering or sorting conditions. Compound indexes can be more efficient than single-field indexes for specific query patterns.

Avoid Over-Indexing: While indexes enhance query performance, over-indexing can lead to increased storage requirements and slower write operations. 

Watch Index Size: Indexes should fit comfortably in memory to ensure optimal query performance. Monitor the size of indexes and consider scaling your hardware or optimizing queries if index size becomes an issue.

Indexing Large Collections: Prioritize fields that are frequently queried or are involved in sorting and filtering operations. Remember that maintaining indexes requires additional resources, so balance query performance and resource utilization.

Review and Optimize: Periodically review query performance using MongoDB’s profiling tools and monitoring utilities. Identify slow queries and analyze their execution plans to determine if indexes can improve performance or if unused indexes can be removed.

Test Before Deployment: Before applying new indexes in production, thoroughly test their impact on query performance.

By adhering to these practices, you can ensure that your MongoDB indexes are finely tuned for maximum query performance, storage efficiency, and database responsiveness.

Looking to enhance MongoDB performance? Dive into our on-demand video: “Top 10 Tips For MongoDB Performance“.

Elevating MongoDB Database Performance with Percona

Creating indexes for solving queries is a good habit, but be aware to not abuse indexing. Excessive indexing can lead to slower writes, excessive pressure on the memory cache, and more evictions.

You should consider maintaining your indexes from time to time, dropping all the duplicates and the unused indexes. The scripts provided in this article may help your index analysis.

Percona Distribution for MongoDB is a freely available MongoDB database alternative, giving you a single solution that combines the best and most important enterprise components from the open source community, designed and tested to work together.

 

Download Percona Distribution for MongoDB Today!

Subscribe
Notify of
guest

3 Comments
Oldest
Newest Most Voted
Inline Feedbacks
View all comments
Kay Agahd

Well written article Corrado, thank you!
As for me, I use the following open-sourced tool to detect redundant or even unused indexes for one or even several databases and collections at once with just 2 clicks. https://github.com/idealo/mongodb-slow-operations-profiler

Corrado Pandiani

Thanks Kay.
I know that tool and used once in the past. It’s cool.
The intention of my article was to provide a script anyone can use at ease in case there’s not an installed monitoring tool.

Kay Agahd

Sure, I understood your intention, well done indeed!