Proper Repository Pattern Design in PHP

Proper Repository Pattern Design in PHP?

I thought I'd take a crack at answering my own question. What follows is just one way of solving the issues 1-3 in my original question.

Disclaimer: I may not always use the right terms when describing patterns or techniques. Sorry for that.

The Goals:

Create a complete example of a basic controller for viewing and editing Users.
All code must be fully testable and mockable.
The controller should have no idea where the data is stored (meaning it can be changed).
Example to show a SQL implementation (most common).
For maximum performance, controllers should only receive the data they need—no extra fields.
Implementation should leverage some type of data mapper for ease of development.
Implementation should have the ability to perform complex data lookups.

The Solution

I'm splitting my persistent storage (database) interaction into two categories: R (Read) and CUD (Create, Update, Delete). My experience has been that reads are really what causes an application to slow down. And while data manipulation (CUD) is actually slower, it happens much less frequently, and is therefore much less of a concern.

CUD (Create, Update, Delete) is easy. This will involve working with actual models, which are then passed to my Repositories for persistence. Note, my repositories will still provide a Read method, but simply for object creation, not display. More on that later.

R (Read) is not so easy. No models here, just value objects. Use arrays if you prefer. These objects may represent a single model or a blend of many models, anything really. These are not very interesting on their own, but how they are generated is. I'm using what I'm calling Query Objects.

The Code:

User Model

Let's start simple with our basic user model. Note that there is no ORM extending or database stuff at all. Just pure model glory. Add your getters, setters, validation, whatever.

class User
{
    public $id;
    public $first_name;
    public $last_name;
    public $gender;
    public $email;
    public $password;
}

Repository Interface

Before I create my user repository, I want to create my repository interface. This will define the "contract" that repositories must follow in order to be used by my controller. Remember, my controller will not know where the data is actually stored.

Note that my repositories will only every contain these three methods. The save() method is responsible for both creating and updating users, simply depending on whether or not the user object has an id set.

interface UserRepositoryInterface
{
    public function find($id);
    public function save(User $user);
    public function remove(User $user);
}

SQL Repository Implementation

Now to create my implementation of the interface. As mentioned, my example was going to be with an SQL database. Note the use of a data mapper to prevent having to write repetitive SQL queries.

class SQLUserRepository implements UserRepositoryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function find($id)
    {
        // Find a record with the id = $id
        // from the 'users' table
        // and return it as a User object
        return $this->db->find($id, 'users', 'User');
    }

    public function save(User $user)
    {
        // Insert or update the $user
        // in the 'users' table
        $this->db->save($user, 'users');
    }

    public function remove(User $user)
    {
        // Remove the $user
        // from the 'users' table
        $this->db->remove($user, 'users');
    }
}

Query Object Interface

Now with CUD (Create, Update, Delete) taken care of by our repository, we can focus on the R (Read). Query objects are simply an encapsulation of some type of data lookup logic. They are not query builders. By abstracting it like our repository we can change it's implementation and test it easier. An example of a Query Object might be an AllUsersQuery or AllActiveUsersQuery, or even MostCommonUserFirstNames.

You may be thinking "can't I just create methods in my repositories for those queries?" Yes, but here is why I'm not doing this:

My repositories are meant for working with model objects. In a real world app, why would I ever need to get the password field if I'm looking to list all my users?
Repositories are often model specific, yet queries often involve more than one model. So what repository do you put your method in?
This keeps my repositories very simple—not an bloated class of methods.
All queries are now organized into their own classes.
Really, at this point, repositories exist simply to abstract my database layer.

For my example I'll create a query object to lookup "AllUsers". Here is the interface:

interface AllUsersQueryInterface
{
    public function fetch($fields);
}

Query Object Implementation

This is where we can use a data mapper again to help speed up development. Notice that I am allowing one tweak to the returned dataset—the fields. This is about as far as I want to go with manipulating the performed query. Remember, my query objects are not query builders. They simply perform a specific query. However, since I know that I'll probably be using this one a lot, in a number of different situations, I'm giving myself the ability to specify the fields. I never want to return fields I don't need!

class AllUsersQuery implements AllUsersQueryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function fetch($fields)
    {
        return $this->db->select($fields)->from('users')->orderBy('last_name, first_name')->rows();
    }
}

Before moving on to the controller, I want to show another example to illustrate how powerful this is. Maybe I have a reporting engine and need to create a report for AllOverdueAccounts. This could be tricky with my data mapper, and I may want to write some actual SQL in this situation. No problem, here is what this query object could look like:

class AllOverdueAccountsQuery implements AllOverdueAccountsQueryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function fetch()
    {
        return $this->db->query($this->sql())->rows();
    }

    public function sql()
    {
        return "SELECT...";
    }
}

This nicely keeps all my logic for this report in one class, and it's easy to test. I can mock it to my hearts content, or even use a different implementation entirely.

The Controller

Now the fun part—bringing all the pieces together. Note that I am using dependency injection. Typically dependencies are injected into the constructor, but I actually prefer to inject them right into my controller methods (routes). This minimizes the controller's object graph, and I actually find it more legible. Note, if you don't like this approach, just use the traditional constructor method.

class UsersController
{
    public function index(AllUsersQueryInterface $query)
    {
        // Fetch user data
        $users = $query->fetch(['first_name', 'last_name', 'email']);

        // Return view
        return Response::view('all_users.php', ['users' => $users]);
    }

    public function add()
    {
        return Response::view('add_user.php');
    }

    public function insert(UserRepositoryInterface $repository)
    {
        // Create new user model
        $user = new User;
        $user->first_name = $_POST['first_name'];
        $user->last_name = $_POST['last_name'];
        $user->gender = $_POST['gender'];
        $user->email = $_POST['email'];

        // Save the new user
        $repository->save($user);

        // Return the id
        return Response::json(['id' => $user->id]);
    }

    public function view(SpecificUserQueryInterface $query, $id)
    {
        // Load user data
        if (!$user = $query->fetch($id, ['first_name', 'last_name', 'gender', 'email'])) {
            return Response::notFound();
        }

        // Return view
        return Response::view('view_user.php', ['user' => $user]);
    }

    public function edit(SpecificUserQueryInterface $query, $id)
    {
        // Load user data
        if (!$user = $query->fetch($id, ['first_name', 'last_name', 'gender', 'email'])) {
            return Response::notFound();
        }

        // Return view
        return Response::view('edit_user.php', ['user' => $user]);
    }

    public function update(UserRepositoryInterface $repository)
    {
        // Load user model
        if (!$user = $repository->find($id)) {
            return Response::notFound();
        }

        // Update the user
        $user->first_name = $_POST['first_name'];
        $user->last_name = $_POST['last_name'];
        $user->gender = $_POST['gender'];
        $user->email = $_POST['email'];

        // Save the user
        $repository->save($user);

        // Return success
        return true;
    }

    public function delete(UserRepositoryInterface $repository)
    {
        // Load user model
        if (!$user = $repository->find($id)) {
            return Response::notFound();
        }

        // Delete the user
        $repository->delete($user);

        // Return success
        return true;
    }
}

Final Thoughts:

The important things to note here are that when I'm modifying (creating, updating or deleting) entities, I'm working with real model objects, and performing the persistance through my repositories.

However, when I'm displaying (selecting data and sending it to the views) I'm not working with model objects, but rather plain old value objects. I only select the fields I need, and it's designed so I can maximum my data lookup performance.

My repositories stay very clean, and instead this "mess" is organized into my model queries.

I use a data mapper to help with development, as it's just ridiculous to write repetitive SQL for common tasks. However, you absolutely can write SQL where needed (complicated queries, reporting, etc.). And when you do, it's nicely tucked away into a properly named class.

I'd love to hear your take on my approach!

July 2015 Update:

I've been asked in the comments where I ended up with all this. Well, not that far off actually. Truthfully, I still don't really like repositories. I find them overkill for basic lookups (especially if you're already using an ORM), and messy when working with more complicated queries.

I generally work with an ActiveRecord style ORM, so most often I'll just reference those models directly throughout my application. However, in situations where I have more complex queries, I'll use query objects to make these more reusable. I should also note that I always inject my models into my methods, making them easier to mock in my tests.

Setting up a repository pattern in MVC

Your Repository looks much more like a TableDataGateway to me. The idea of a Repository is to be another layer on top of the mapping layer that mediates between the domain objects and the database. It also serves as an in-memory storage of domain objects (something that is missing from your example) and may encapsulate a Factory for creating new Entities. They often also allow for querying the Repository by Specification patterns:

Repository Sequence Diagram from POEAA

It's a rather complex pattern. You can find good write-ups about Repository in

Fowler: Patterns of Enterprise Application Architecture
Evans: Domain Driven Design
http://thinkddd.com/assets/2/Domain_Driven_Design_-_Step_by_Step.pdf

Also check Good Domain Driven Design samples

Traditional Classes when using the Repository Pattern

The idea with the design pattern is to separate the persistence logic (interfacing with the database and/or cache) from the domain model.

Basically any changes being made to the object "in memory" should be kept in the "traditional" class. The repository will grab or change the data from the database, but if there is anything involving business logic that manipulates the data that does not involve changes to the database (whether it be formatting the data or performing some mathematical operation on it), you can keep that in the domain class.

Best Practices (specifically the Single-Responsibility Principle) encourages the separation of a repository or data-mapping logic from the business logic that can be encapsulated with the object data.

Proper Repository Pattern Design in PHP