The infamous Repository Pattern in PHP

The repository pattern is easily one of the most talked about patterns. Some love it; others... not so much. In this post we're going to look at the repository pattern; what it is and what it is not. We'll also look at the differences between read and write repositories and some conventions.

What is the Repository Pattern?

The repository pattern is a data source abstraction that behaves like a collection; containing domain (value) objects. This abstraction is called a Repository and is often introduced as an interface. The repository is considered to be the single source of truth for retrieving and storing domain objects or entities.

Let's look at a quick example repository for retrieving Book entities.

interface BookRepository {
    public function get(BookId $id): Book;
    public function save(Book $book): void;
    public function delete(Book $book): void;
}

This is pretty much all that is needed for a functional repository; a way to get, save and delete a book from the collection. You would use this Repository like this:

class BookController {
    // Inject your repository.
    public function __construct(private BookRepository $repository) {}
 
    public function someAction(BookId $book_id): void {
        // Retrieve the entity.
        $book = $this->repository->get($book_id);
 
        // Manipulate the entity.
        $book->setValue(...);
 
        // Store the entity through the repository.
        $this->repository->save($book);
    }
}

Notice the absence of an add and update method. These methods are of no use to a repository, because it communicaties knowledge about the underlying data source (e.g. whether the Book is already stored). The save method will guide the process of adding or updating the Book in data source.

Note: Since a Repository is the single source of truth, it should be the only service to persist an entity. This makes the repository pattern somewhat incompatible with the active record pattern, because then the entities could persist themselves. To use active record, make sure you only call the entity's save method inside the repository's save method.

A repository is not a factory

As you might have noticed; the repository does not have a create method. This is because the purpose of a repository is to contain objects. Just as a library (mostly) does not write and publish books, a repository does not create the objects it contains. If the creation of the objects is a complex matter (e.g. it takes many dependencies to create the object) this job is probably more suitable for a factory.

This doesn't mean you cannot combine a repository and a factory in a single service. You absolutely can; but the question is if you should. This is a question I cannot answer for your situation. Personally, I like to keep my services separate, so they can evolve independently.

A repository is not an ORM

Object Relation Mapping (ORM) is abstraction layer on top of a (relational) database. It uses a Database Abstraction Layer (DBAL) to retrieve a row from the database, and map its columns to the properties of an entity object. An DBAL is often equipped to handle multiple databases types, like MySQL, PostgreSQL or Redis. The goal of the ORM is to make interaction with the database an object-oriented approach. You update the columns as properties on the entity, and the ORM does the translation back to and from the database (through the DBAL).

Often, an ORM has a Query Builder; which is another object-oriented approach for retrieving data as entity objects from the database. You construct your WHERE clauses with simple ->where() methods, and the ORM will again translate these query objects to real database queries and return you the results as mapped entities.

A repository however is not concerned with specifics on how the entity came to be, where it came from, or how it was located. The underlying data might be located and stored using an ORM, or read and written into a file or even retrieved via a search engine. But those are only the technical implementations of the repository. The repository only represents these entities as if it was a collection.

Caution: When creating a repository in conjunction with an ORM, it is tempting to use Query Builder specific arguments, or even to return the Query Builder to retrieve the information. Do not do this. In that case you are coupling the Query Builder to your repository, defeating the purpose of the repository.

When it comes to advantages of a Repository; the ability to "easily replace the database layer" is often thought of to be a non-argument and is ridiculed by saying it is very unlikely to replace your database implementation, and that (as we have seen) the ORM already takes care of that.

However, a repository abstracts away the complicated queries (even if you use a Query Builder) or services needed behind the scenes to retrieve the required domain objects. This means that you can start off with an implementation of the repository that fully relies on a database like MySQL (even through an ORM), while being able to replace certain logic in the future with a service like a search engine. This is especially useful when using Read Repositories (more on those later).

Note: The repository pattern can be mixed with the Decorator Pattern pretty well. For example: you can decorate your repository with a cacheable implementation which could reduce the amount of round trips to underlying data source.

Another advantage of a repository is you can easily replace it with an in-memory variant like this, which makes for easy (and maybe even more important: fast) unit testing.

class InMemoryBookRepository implements BookRepository
{
    /** @var Book[] */
    private array $collection = [];
 
    public function get(BookId $id): Book
    {
        return $this->collection[$id->value] ?? throw new BookNotFound();
    }
 
    public function save(Book $book): void
    {
        $this->collection[$book->id->value] = $book;
    }
 
    public function delete(Book $book): void
    {
        unset($this->collection[$book->id->value]);
    }
}

To test the BookController::someAction method mentioned earlier, we would only need to inject this InMemoryBookRepository. We can then assert the changes through the repository.

public function testSomeAction(): void {
    // Create the book and store it in the repository (Arrange).
    $repository = new InMemoryBookRepository();
    $book_id = new BookId(123);
    $entity = $this->createBook($book_id, ...);
    $repository->save($entity);
 
    // Set up the controller, and call the method with the book id (Act).
    $controller = new BookController($repository);
    $controller->someAction($book_id);
 
    // Test if the controller performed its task (Assert).
    self::assertSame('expected value', $entity->method());
}

Note: Some developers are against using an in-memory repository and encourage the use of a mock; because an in-memory class adds the need for another test for this implementation. And while this might be true (I would not test this class) let's be honest; it is not that much work, and you get the benefit of using it in multiple places. It also has the exact same interaction, without the need to add assertions and instructions on what to return on which method calls. I personally would advise against using a mock.

Naming conventions (find vs. get)

You'll often come across different names for methods on a repository. delete might very well be called remove, and save could be called store. But their underlying purpose does not change. There is however a small nuance when it comes to the naming of query methods (prefixed with) get and find.

In general, the use of get-methods implies the client expects the object to exist and be returned. In the case of find-methods, there is an expectation that the object might not exist. Therefore, in most cases get will throw a NotFound exception when the object cannot be located, while find will return null or an empty collection.

get is also predominantly used to retrieve a single object, while find is used to locate multiple objects based on a certain criteria. This is why read repositories will often have methods like findBySomeValue($value) for retrieving multiple objects, and findOneBySomeValue($value) to implicate the result of a single object.

Note: It is custom for find methods to return an empty collection, when there are no matches, instead of a null result.

// Implies the existence of a Book.
public function get(BookId $book_id): Book;
 
// Implies the possibility of not finding the Book.
public function findOneById(BookId $book_id): ?Book;
 
// Expects to find multiple Books from a given Author.
public function findByAuthor(Author $author): BookCollection;

This being said; it is a convention and not a hard rule. If you are more comfortable using get-methods without an exception, or even find-methods with; just go for it. There is no repository-police (that I know of).

Read vs. Write repositories

There is a famous line by a statistician named George Box, who wrote: "All models are wrong, some are useful." When it comes to domain value objects, it highly depends on the context what information they need to contain.

Let's continue with our Book example. A book might be viewed within different contexts. When searching for a book, it is useful to know the title, author, isbn, category and an excerpt of what the book is about. When it comes to the context of shipping (of physical books); this is useless information. With shipping we are more interested in the dimensions and the weight to be able to figure out the size of the packaging and the cost of the shipping.

Both of these models are "wrong" (they do not portray the entirety of the object), but they are useful in their context.

Write repositories

Within your application there might be a database table that contains all the information for the books. This information needs to be written (at least once) and maybe updated once in a while. For these cases we use a Write Repository. This is the repository that stores, updates and deletes the underlying data.

Because it is very likely you are writing to only one entity (at a time), a write repository is often a very small interface, like the BookRepository at the beginning of this post. You probably already have the BookId available, so there is no need for additional find methods. After you get your entity, you either store the (updated) entity, or remove it.

Read repositories

In a lot of applications it is more common to read than it is to write data. Again continuing with our Book example; it is likely you would need a page that displays the details about a book. It will show: the cover, the title, author, long excerpt, and a lot of specifics like page count, book binding, etc.

It's also very likely you have a page that lists books. This list will probably only contain an image of the cover, the title, author and a tiny excerpt, and a link to the detail page. This list can likely also be filtered by category, author or a search phrase.

Just looking at this example, we already see two different contexts: Listing and Details (there might be better terms; but this is just an example). And we also see that the listing has a few specifics filter options, while the details page does not. We can also determine that the amount of information needed for the listing page is way less than the detail page.

Retrieving entities

Since the details page contains just about every piece of information we have about a book (in this example) it might make sense to use the write repository to retrieve the Book entity. We can then use it to show all the required details.

However, note that this will probably require you to add a bunch of helper methods to return certain information in a specific format. Instead, we might want to create a specific "View Model" and let it contain all the information and helper methods.

Retrieving context specific domain objects (View Models)

In the case of the listing, it makes sense to create a specific object that represents a book within the listing context. Such a domain object is often called a "View Model". It is a context specific representation of the actual book. For this context we can create a specific Read Repository like this:

namespace App\Books\Listing;
 
interface BookRepository {
    public function findByAuthor(Author $author): BookCollection;
    public function findByCategory(Category $category): BookCollection;
    public function findBySearchPhrase(string $search_phrase): BookCollection;
}

The BookCollection here is also a specific collection in the Listing context, and it contains only App\Books\Listing\Book objects (or view models).

As you can see, a read repository (almost) only contains find methods. And the methods are only catered to the actual needs of the context.

Criteria Pattern / Specification pattern As an alternative to having a bunch of findByX methods, you can also incorporate the Criteria or Specification pattern. In this pattern you create a single findBy(Criteria $criteria) method which retrieves all objects that satisfy the criteria.

While this may look like a great one-size-fits-all solution, you have to be careful not to be creating a Query Builder. Because for every Criteria you create, you need an implementation that is compatible with your repository's underlying data source. This can get cumbersome real quick. My advice would be to keep read repositories tiny and only catered to the actual requirements; not to possible needs in the future.

Retrieving from projections

When your application gets bigger, and the amount of data gets larger, certain queries might become very slow. In our example we might be retrieving the name of the author of a book and other information through a relation to the authors table or other join- or pivot-tables. The amount of relations can make these queries very slow.

Since we only need certain information we can create a specific table containing all the information we need. This table can be filled by running the slow query once, and update the table every time a book gets updated. Such a table is called a "projection". The queries from these projections are very fast, since it no longer requires the joins.

This is a big advantage of the repository pattern. The only place where you need to change this logic is in the repository. The repository will still return the same objects with the same information, only the technical implementation has changed.

Retrieving from external services

Another example of a technical implementation is the use of search engines. The findBySearchPhrase(string $search_phrase) method can initially be implemented by some simple ORM or database logic. But these abilities are limited to what the database can do. A search engine is beter catered to the indexing of information, and can even locate data based on mistyped words.

So when your application matures; you can replace the implementation of these methods to use a proper external service, and map the returning data onto the View models. Again without changing the usage of the repository.

When not to use the Repository Pattern in PHP

As with most patterns; the repository pattern isn't a "once-size-fits-all-solution". So lets look at a few situations in which the repository pattern probably isn't a good fit.

Small CRUD applications

If you have a small app that is primarily CRUD, using the repository pattern is probably not necessary. When your entities are small, and you only use them with an ORM, the repository pattern will only be a wrapper around the ORM; which will not provide many benefits. Of course there is the testability aspect; but if your framework already supports testing for the ORM, it won't add much.

Large reports

When your application creates big reports using complex custom queries, or if your result sets are so large you need to paginate the results for displaying purposes; it would make more sense to use an approach like CQRS (Command Query Responsibility Segregation) . In that case you would create a context specific query object to retrieve the domain objects according to your criteria.

Summary & Links

In this (rather lengthy) post we've explored what the Repository pattern is and what it is not, and also when you might (not) want to use it. This post, like the rest of the Patterns series is meant to be informative only, even if my personal likings might shine through.

To summarise some things we explored:

A repository is a data source abstraction that acts like a collection to retrieve and update domain objects.
It does not know about any technical implementation, be it a database / ORM or in-memory variant, and is therefor completely decoupled from them.
It is not meant to replace your ORM, or meant to be able to replace the database layer; but it can work together with an ORM.
Write repositories are used to get, store or delete domain objects like entities.
Read repositories are used in different contexts to find context specific models of your domain objects for reading purposes.

Thank you for reading! If you found a mistake, or have questions I might be able to answer; please let me know in the comments below. I'm always out to learn new things myself and improve any (false) believes I might hold.

You might also consider following me on Twitter / X as that is the social media I'm most active on.

Here is a short list of the links that are mentioned in this post, or might be interesting to you.

Decorator vs. Proxy Pattern in the Patterns for the Rest of Us-series
Repositories and their true purpose by Muhammed Sari
CQRS as explained by Martin Fowler
Stop Mocking About