Repository Pattern Step by Step Explanation

Repository Pattern - How to understand it and how does it work with complex entities?

You can read my "repository for dummies" post to understand the simple principle of the repository. I think your problem is that you're working with DTOs and in that scenario, you don't really use the repository pattern, you're using a DAO.

The main difference between a repository and a dao is that a repository returns only objects that are understood by the calling layer. Most of the time the repository is used by the business layer and thus, it returns business objects. A dao returns data which might or might not be a whole business object i.e the data isn't a valid business concept.

If your business objects are just data structures, it might be a hint you have a modeling problem i.e bad design. A repository makes more sense with 'rich' or at least properly encapsulated objects. If you're just loading/saving data structures probably you don't need a repository the orm is enough.

If you're dealing with business objects that are composed from other objects (an aggregate) and that object needs all its parts in order to be consistent (an aggregate root) then the repository pattern is the best solution because it will abstract all persistence details. Your app will just ask for a 'Product' and the repository will return it as a whole, regardless of how many tables or queries are required to restore the object.

Based on your code sample, you don't have 'real' business objects. You have data structures used by Hibernate. A business object is designed based on business concepts and use cases. The repository makes it possible for the BL not to care about how that object is persisted. In a way, a repository acts as a 'converter/mapper' between the object and the model that will be persisted. Basically the repo 'reduces' the objects to the required for persistence data.

A business object is not a ORM entity.It might be from a technical point of view, but from a design pov , one models business stuff the other models persistence stuff. In many cases these are not directly compatible.

The biggest mistake is to design your business object according to storage needs and mindset. And contrary to what many devs believe, an ORM purpose is not to persist business objects. Its purpose is to simulate a 'oop' database on top of a rdbms. The ORM mapping is between your db objects and tables, not between app objects (even less when dealing with business objects) and tables.

How to implement the repository pattern the right way?

Your questions are absolutely normal, but don't expect to find an absolute answer. Welcome to the software industry!

Here is my opinion:

Is it good OOP to have an application that relies on an architecture that, next to their repositories, only has models/classes
that hold values with no behaviour?

I think you try to implement a repository pattern, but you miss a higher architecture view. Most apps are at least decoupled in 3 layers: View (Presentation), Business and DataAccess.
The repository patterns takes place in the DataAccess, this is where you can find pure data object.
But this data access layer is used by a business layer, where you will find a domain model, classes with business behavior AND data.
The unit tests effort must be on domain model in the business layer.
These tests should not care about how the data are stored.

Where do I call repository methods in the architecture of the application?

Again no absolute answer, but usually it makes sense to use Something like a business service. These services could arrange the flow between different domain object, and load, and save them in repositories.
This is basically what you do in your AddFriend class, and it belongs into a business layer.

Regarding to unit testing, we need with this approach some dependancy
injection, which says to me that it is not a independant class

Business services are ususally dependant on Repositories, it is a really common case for unit testing. The Tram class can hold business behavior and is still independent. The AddTram business service need a repository, and dependency injection allows to test it anyway.

Methods that insert, update and delete new entities in a database, are
they supposed to be in a class itself?

For this one I can be clear and loud: please don't do that, and yes, CRUD operations belong to the Tram Repository. It's certainly not a business logic. That's why in your example you need two classes in two different layers:

Tram could be the business object in the business layer (No Crud operation here)
TramRepository is the object which need to store the data for a Tram (where you will find the CRUD operation)

"Because i have seen some other students making use of static methods
in a domain class which provides these functionality"

Use of static methods is clearly not a good idea for that, it means anybody could store data through your object, even though it's supposed to handle business case. And again, data storage is not a business case, it's a technical necessity.

Now to be fair, all these concepts need to be discuss in context to make sense, and need to be adapted on each new project. This is why ou job is both hard and exciting: context is king.

Also I wrote a blog article centered on MVVM, but I think it can help to understand my answer.

Why use Repository Pattern or please explain it to me?

One thing is to increase testability and have a loose coupling to underlying persistance technology. But you will also have one repository per aggregate root object (eg. an order can be an aggregate root, which also have order lines (which are not aggregate root), to make domain object persistance more generic.

It's also makes it much easier to manage objects, because when you save an order, it will also save your child items (which can be order lines).

Easy implementation of Repository Pattern

In good-designed code you have to use interfaces, but not implementations. It have benefits. Imagine you have a class with a code fragment:

IBookRepository bookRepository;

public Book GetInterestingBook() {
  var book = bookRepository.getBooks().FirstOrDefault(x => x.IsInteresting);
  return book;
}

Now I'll show you some benefits:

Using interface allows you to create bookRepository instances implicitly via Dependency Injection (Ninject, Unity, etc. There are many of them). If you decide to change repository implementation from Entity Framework to NHibernate you don't need to make changes in code. Just change mapping in mapping file to use for IBookRepository NHibernateRepository instead of EFBookRepository. Of course, NHibernateRepository should be developed too.
Using interface allows you to implement great unit-testing via MockObjects. You just need to implement MockBookRepository and use it on injection. There are many Mock frameworks that can help you with it - Moq for example.
You can switch repositories dynamically without changing you code. For example if your database is temporary down, but you have another one that can handle new orders for example as they are critical (bad example, i know). In this case you detect DB fall down and make something like:
currentRepository = temporaryOrdersOnlyRepository;

Now your code continues functioning except your get data and delete methods returns exceptions but CreateNewOrder() method will save orders to string file )

Good luck!

Repository Pattern and Dataset

Answer: No, they serve different purposes.

Dataset represents an in-memory cache of data and doesn't provide methods to modify data.

In contrast, Repository provides methods to operate with data: objects can be added to and removed from the Repository, as they can from a simple collection of objects, and the mapping code encapsulated by the Repository will carry out the appropriate operations behind the scenes

ps: you may find my answer to be not language-agnostic, in this case please provide more details along with your question

Repository Pattern in Mvc 4.0

Check out this question Unit Of Work & Generic Repository with Entity Framework 5 I think it is described well there.

Here is complete package you can use http://www.nuget.org/packages/Repository.EntityFramework/

And one more link: http://www.asp.net/mvc/tutorials/getting-started-with-ef-using-mvc/implementing-the-repository-and-unit-of-work-patterns-in-an-asp-net-mvc-application.

BUT before implementing repository pattern I would suggest you to think "Do you really need it?".

Proper Repository Pattern Design in PHP?

I thought I'd take a crack at answering my own question. What follows is just one way of solving the issues 1-3 in my original question.

Disclaimer: I may not always use the right terms when describing patterns or techniques. Sorry for that.

The Goals:

Create a complete example of a basic controller for viewing and editing Users.
All code must be fully testable and mockable.
The controller should have no idea where the data is stored (meaning it can be changed).
Example to show a SQL implementation (most common).
For maximum performance, controllers should only receive the data they need—no extra fields.
Implementation should leverage some type of data mapper for ease of development.
Implementation should have the ability to perform complex data lookups.

The Solution

I'm splitting my persistent storage (database) interaction into two categories: R (Read) and CUD (Create, Update, Delete). My experience has been that reads are really what causes an application to slow down. And while data manipulation (CUD) is actually slower, it happens much less frequently, and is therefore much less of a concern.

CUD (Create, Update, Delete) is easy. This will involve working with actual models, which are then passed to my Repositories for persistence. Note, my repositories will still provide a Read method, but simply for object creation, not display. More on that later.

R (Read) is not so easy. No models here, just value objects. Use arrays if you prefer. These objects may represent a single model or a blend of many models, anything really. These are not very interesting on their own, but how they are generated is. I'm using what I'm calling Query Objects.

The Code:

User Model

Let's start simple with our basic user model. Note that there is no ORM extending or database stuff at all. Just pure model glory. Add your getters, setters, validation, whatever.

class User
{
    public $id;
    public $first_name;
    public $last_name;
    public $gender;
    public $email;
    public $password;
}

Repository Interface

Before I create my user repository, I want to create my repository interface. This will define the "contract" that repositories must follow in order to be used by my controller. Remember, my controller will not know where the data is actually stored.

Note that my repositories will only every contain these three methods. The save() method is responsible for both creating and updating users, simply depending on whether or not the user object has an id set.

interface UserRepositoryInterface
{
    public function find($id);
    public function save(User $user);
    public function remove(User $user);
}

SQL Repository Implementation

Now to create my implementation of the interface. As mentioned, my example was going to be with an SQL database. Note the use of a data mapper to prevent having to write repetitive SQL queries.

class SQLUserRepository implements UserRepositoryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function find($id)
    {
        // Find a record with the id = $id
        // from the 'users' table
        // and return it as a User object
        return $this->db->find($id, 'users', 'User');
    }

    public function save(User $user)
    {
        // Insert or update the $user
        // in the 'users' table
        $this->db->save($user, 'users');
    }

    public function remove(User $user)
    {
        // Remove the $user
        // from the 'users' table
        $this->db->remove($user, 'users');
    }
}

Query Object Interface

Now with CUD (Create, Update, Delete) taken care of by our repository, we can focus on the R (Read). Query objects are simply an encapsulation of some type of data lookup logic. They are not query builders. By abstracting it like our repository we can change it's implementation and test it easier. An example of a Query Object might be an AllUsersQuery or AllActiveUsersQuery, or even MostCommonUserFirstNames.

You may be thinking "can't I just create methods in my repositories for those queries?" Yes, but here is why I'm not doing this:

My repositories are meant for working with model objects. In a real world app, why would I ever need to get the password field if I'm looking to list all my users?
Repositories are often model specific, yet queries often involve more than one model. So what repository do you put your method in?
This keeps my repositories very simple—not an bloated class of methods.
All queries are now organized into their own classes.
Really, at this point, repositories exist simply to abstract my database layer.

For my example I'll create a query object to lookup "AllUsers". Here is the interface:

interface AllUsersQueryInterface
{
    public function fetch($fields);
}

Query Object Implementation

This is where we can use a data mapper again to help speed up development. Notice that I am allowing one tweak to the returned dataset—the fields. This is about as far as I want to go with manipulating the performed query. Remember, my query objects are not query builders. They simply perform a specific query. However, since I know that I'll probably be using this one a lot, in a number of different situations, I'm giving myself the ability to specify the fields. I never want to return fields I don't need!

class AllUsersQuery implements AllUsersQueryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function fetch($fields)
    {
        return $this->db->select($fields)->from('users')->orderBy('last_name, first_name')->rows();
    }
}

Before moving on to the controller, I want to show another example to illustrate how powerful this is. Maybe I have a reporting engine and need to create a report for AllOverdueAccounts. This could be tricky with my data mapper, and I may want to write some actual SQL in this situation. No problem, here is what this query object could look like:

class AllOverdueAccountsQuery implements AllOverdueAccountsQueryInterface
{
    protected $db;

    public function __construct(Database $db)
    {
        $this->db = $db;
    }

    public function fetch()
    {
        return $this->db->query($this->sql())->rows();
    }

    public function sql()
    {
        return "SELECT...";
    }
}

This nicely keeps all my logic for this report in one class, and it's easy to test. I can mock it to my hearts content, or even use a different implementation entirely.

The Controller

Now the fun part—bringing all the pieces together. Note that I am using dependency injection. Typically dependencies are injected into the constructor, but I actually prefer to inject them right into my controller methods (routes). This minimizes the controller's object graph, and I actually find it more legible. Note, if you don't like this approach, just use the traditional constructor method.

class UsersController
{
    public function index(AllUsersQueryInterface $query)
    {
        // Fetch user data
        $users = $query->fetch(['first_name', 'last_name', 'email']);

        // Return view
        return Response::view('all_users.php', ['users' => $users]);
    }

    public function add()
    {
        return Response::view('add_user.php');
    }

    public function insert(UserRepositoryInterface $repository)
    {
        // Create new user model
        $user = new User;
        $user->first_name = $_POST['first_name'];
        $user->last_name = $_POST['last_name'];
        $user->gender = $_POST['gender'];
        $user->email = $_POST['email'];

        // Save the new user
        $repository->save($user);

        // Return the id
        return Response::json(['id' => $user->id]);
    }

    public function view(SpecificUserQueryInterface $query, $id)
    {
        // Load user data
        if (!$user = $query->fetch($id, ['first_name', 'last_name', 'gender', 'email'])) {
            return Response::notFound();
        }

        // Return view
        return Response::view('view_user.php', ['user' => $user]);
    }

    public function edit(SpecificUserQueryInterface $query, $id)
    {
        // Load user data
        if (!$user = $query->fetch($id, ['first_name', 'last_name', 'gender', 'email'])) {
            return Response::notFound();
        }

        // Return view
        return Response::view('edit_user.php', ['user' => $user]);
    }

    public function update(UserRepositoryInterface $repository)
    {
        // Load user model
        if (!$user = $repository->find($id)) {
            return Response::notFound();
        }

        // Update the user
        $user->first_name = $_POST['first_name'];
        $user->last_name = $_POST['last_name'];
        $user->gender = $_POST['gender'];
        $user->email = $_POST['email'];

        // Save the user
        $repository->save($user);

        // Return success
        return true;
    }

    public function delete(UserRepositoryInterface $repository)
    {
        // Load user model
        if (!$user = $repository->find($id)) {
            return Response::notFound();
        }

        // Delete the user
        $repository->delete($user);

        // Return success
        return true;
    }
}

Final Thoughts:

The important things to note here are that when I'm modifying (creating, updating or deleting) entities, I'm working with real model objects, and performing the persistance through my repositories.

However, when I'm displaying (selecting data and sending it to the views) I'm not working with model objects, but rather plain old value objects. I only select the fields I need, and it's designed so I can maximum my data lookup performance.

My repositories stay very clean, and instead this "mess" is organized into my model queries.

I use a data mapper to help with development, as it's just ridiculous to write repetitive SQL for common tasks. However, you absolutely can write SQL where needed (complicated queries, reporting, etc.). And when you do, it's nicely tucked away into a properly named class.

I'd love to hear your take on my approach!

July 2015 Update:

I've been asked in the comments where I ended up with all this. Well, not that far off actually. Truthfully, I still don't really like repositories. I find them overkill for basic lookups (especially if you're already using an ORM), and messy when working with more complicated queries.

I generally work with an ActiveRecord style ORM, so most often I'll just reference those models directly throughout my application. However, in situations where I have more complex queries, I'll use query objects to make these more reusable. I should also note that I always inject my models into my methods, making them easier to mock in my tests.

Why repository pattern is extensively used in entity framework as though it is complex?

To provide a code example of why Approach 1 is better consider a test scenario. With Approach 2, when you call the Insert method in a unit test you will actually be calling:

Controller.Insert > MyBal.Insert > Database execution.

This is not what we want in unit testing. In unit testing we just want to test the unit (code block) not the object graph of it's entire call stack.

This is because you have hardcoded a dependency on MyBal and there is no way to switch it out. If we used Approach 1 however we might have:

public class HomeController : Controller
{
        private readonly IMyBalService  _ibalService;

        public HomeController(IMyBalService ibalService)
        {
            _ibalService = ibalService;
        }

        public ActionResult Insert(Model M)
        {
           ibalService.Insert();
        }
 }

Now you can have your MyBal class implement IBal interface:

public class MyBal : IBalService
{
    public void Insert()
    {
        using (MyEntities context = new MyEntities ())
        {
            var result = context.MyTable.Add(_MyTable);
            context.SaveChanges();
            return result;
        }
}

And under production scenario all would work as it currently does. BUT under test now you can also make an BalMockService.

public class BalMockService : IBalService
{
    public void Insert() {}
}

And you can pass that into your HomeController when testing. Notice how we removed all the database calls? They are not needed for testing Controller.Insert. Alternatively we could have made BalMockService return some test data, but you have a void so it's not needed.

The point is we've decoupled HomeController from MyBal and we've also advertised the fact the HomeController requires an IBalService of some kind to run.