How Is Std::Function Implemented

How is std::function implemented?

The implementation of std::function can differ from one implementation to another, but the core idea is that it uses type-erasure. While there are multiple ways of doing it, you can imagine a trivial (not optimal) solution could be like this (simplified for the specific case of std::function<int (double)> for the sake of simplicity):

struct callable_base {
   virtual int operator()(double d) = 0;
   virtual ~callable_base() {}
};
template <typename F>
struct callable : callable_base {
   F functor;
   callable(F functor) : functor(functor) {}
   virtual int operator()(double d) { return functor(d); }
};
class function_int_double {
   std::unique_ptr<callable_base> c;
public:
   template <typename F>
   function(F f) {
      c.reset(new callable<F>(f));
   }
   int operator()(double d) { return c(d); }
// ...
};

In this simple approach the function object would store just a unique_ptr to a base type. For each different functor used with the function, a new type derived from the base is created and an object of that type instantiated dynamically. The std::function object is always of the same size and will allocate space as needed for the different functors in the heap.

In real life there are different optimizations that provide performance advantages but would complicate the answer. The type could use small object optimizations, the dynamic dispatch can be replaced by a free-function pointer that takes the functor as argument to avoid one level of indirection... but the idea is basically the same.

Regarding the issue of how copies of the std::function behave, a quick test indicates that copies of the internal callable object are done, rather than sharing the state.

// g++4.8
int main() {
   int value = 5;
   typedef std::function<void()> fun;
   fun f1 = [=]() mutable { std::cout << value++ << '\n' };
   fun f2 = f1;
   f1();                    // prints 5
   fun f3 = f1;
   f2();                    // prints 5
   f3();                    // prints 6 (copy after first increment)
}

The test indicates that f2 gets a copy of the callable entity, rather than a reference. If the callable entity was shared by the different std::function<> objects, the output of the program would have been 5, 6, 7.

how is std::is_function implemented

It exploits this sentence from https://eel.is/c++draft/basic.type.qualifier#1

A function or reference type is always cv-unqualified.

So, given a type T, it tries to make a const T. If the result is not a const-qualified type, then T must be a function or reference type. Then it eliminates reference types, and done.

(not to be confused with member functions that have const in the end: that is, in standardese, "a function type with a cv-qualifier-seq", not the same as a "cv-qualified function type")

How is std::is_function implemented?

Let's go over the conditions as they appear:

If const T isn't const (const doesn't really apply to function types since functions aren't objects), and T isn't a reference (const doesn't apply to references either for the same reason), it's a function type. int (or any other non-function-non-reference type) wouldn't fit in because is_const<const int>::value is true.

According to C++17 Standard §11.3.5 Functions / section 7: (Emphasis mine)

The effect of a cv-qualifier-seq in a function declarator is not the
same as adding cv-qualification on top of the function type. In the
latter case, the cv-qualifiers are ignored. [ Note: A function type
that has a cv-qualifier-seq is not a cv-qualified type; there are no
cv-qualified function types. — end note ] [...]

Is there a standalone implementation of std::function?

The 60k came from exception handling being added by the compiler, because exceptions were required for std::function. std::function only throws one exception, "bad_function_call". So I removed the code that threw the exception, now it seg faults if an empty function is called, and I saved myself 60k.

How to implement a std::function with operator= that can check if its rhs has same signature

Cppreference says about operator=:

Sets the target of *this to the callable f, as if by executing function(std::forward(f)).swap(*this);. This operator does not participate in overload resolution unless f is Callable for argument types Args... and return type R.

This can be easily checked with std::is_invocable_r helper:

#include <type_traits>

template <class>
class function {};

template <class R, class... Args>
class function<R(Args...)> {
   public:
    template <typename F, typename x = std::enable_if_t<
                              std::is_invocable_r<R, F, Args...>::value>>
    function& operator=(F&& f) {
        // logic...
        return *this;
    }
};

#include <functional>
#include <string>

void bar1(std::string) {}
void bar2(int) {}
void bar3(float) {}
int bar4(int) { return 0; }

int main(int argc, char** argv) {
    //std::function<void(int)> f;
    function<void(int)> f;

    // f = bar1; ERROR
    f = bar2;
    f = bar3;
    f = bar4;
    return 0;
}

I am using the SFINAE version with an extra template argument as I find it most clear but return-value-based can be used too.
Same as for std::function, this is really forgiving - extra return values in case of R=void are ignored, implicit conversions are enabled. This matches what std::is_invocable_r allows.

Simple std::any implementation based on function pointers

You can work around this non-conformance by using a variable template (or a static local variable—in which case you call the function in the constructor—or a static data member of a class template):

template<class T>
constexpr decltype(nullptr) tag{};

class any {
  void *p;
  const decltype(nullptr) *t;

public:
  template<class T>
  any(…) : p(…),t(&tag<T>) {}

  // contains_type much as before
};

This is of course basically std::type_info, but might be preferable if the other capabilities of that facility are unneeded.

Why does the implementation of std::any use a function pointer + function op codes, instead of a pointer to a virtual table + virtual calls?

Consider a typical use case of a std::any: You pass it around in your code, move it dozens of times, store it in a data structure and fetch it again later. In particular, you'll likely return it from functions a lot.

As it is now, the pointer to the single "do everything" function is stored right next to the data in the any. Given that it's a fairly small type (16 bytes on GCC x86-64), any fits into a pair of registers. Now, if you return an any from a function, the pointer to the "do everything" function of the any is already in a register or on the stack! You can just jump directly to it without having to fetch anything from memory. Most likely, you didn't even have to touch memory at all: You know what type is in the any at the point you construct it, so the function pointer value is just a constant that's loaded into the appropriate register. Later, you use the value of that register as your jump target. This means there's no chance for misprediction of the jump because there is nothing to predict, the value is right there for the CPU to consume.

In other words: The reason that you get the jump target for free with this implementation is that the CPU must have already touched the any in some way to obtain it in the first place, meaning that it already knows the jump target and can jump to it with no additional delay.

That means there really is no indirection to speak of with the current implementation if the any is already "hot", which it will be most of the time, especially if it's used as a return value.

On the other hand, if you use a table of function pointers somewhere in a read-only section (and let the any instance point to that instead), you'll have to go to memory (or cache) every single time you want to move or access it. The size of an any is still 16 bytes in this case but fetching values from memory is much, much slower than accessing a value in a register, especially if it's not in a cache. In a lot of cases, moving an any is as simple as copying its 16 bytes from one location to another, followed by zeroing out the original instance. This is pretty much free on any modern CPU. However, if you go the pointer table route, you'll have to fetch from memory every time, wait for the reads to complete, and then do the indirect call. Now consider that you'll often have to do a sequence of calls on the any (i.e. move, then destruct) and this will quickly add up. The problem is that you don't just get the address of the function you want to jump to for free every time you touch the any, the CPU has to fetch it explicitly. Indirect jumps to a value read from memory are quite expensive since the CPU can only retire the jump operation once the entire memory operation has finished. That doesn't just include fetching a value (which is potentially quite fast because of caches) but also address generation, store forwarding buffer lookup, TLB lookup, access validation, and potentially even page table walks. So even if the jump address is computed quickly, the jump won't retire for quite a long while. In general, "indirect-jump-to-address-from-memory" operations are among the worst things that can happen to a CPU's pipeline.

TL;DR: As it is now, returning an any doesn't stall the CPU's pipeline (the jump target is already available in a register so the jump can retire pretty much immediately). With a table-based solution, returning an any will stall the pipeline twice: Once to fetch the address of the move function, then another time to fetch the destructor. This delays retirement of the jump quite a bit since it'll have to wait not only for the memory value but also for the TLB and access permission checks.

Code memory accesses, on the other hand, aren't affected by this since the code is kept in microcode form anyway (in the µOp cache). Fetching and executing a few conditional branches in that switch statement is therefore quite fast (and even more so when the branch predictor gets things right, which it almost always does).

How std::function works

It uses some type erasure technique.

One possibility is to use mix subtype polymorphism with templates. Here's a simplified version, just to give a feel for the overall structure:

template <typename T>
struct function;

template <typename Result, typename... Args>
struct function<Result(Args...)> {
private:
    // this is the bit that will erase the actual type
    struct concept {
        virtual Result operator()(Args...) const = 0;
    };

    // this template provides us derived classes from `concept`
    // that can store and invoke op() for any type
    template <typename T>
    struct model : concept {
        template <typename U>
        model(U&& u) : t(std::forward<U>(u)) {}

        Result operator()(Args... a) const override {
            t(std::forward<Args>(a)...);
        }

        T t;
    };

    // this is the actual storage
    // note how the `model<?>` type is not used here    
    std::unique_ptr<concept> fn;

public:
    // construct a `model<T>`, but store it as a pointer to `concept`
    // this is where the erasure "happens"
    template <typename T,
        typename=typename std::enable_if<
            std::is_convertible<
                decltype( t(std::declval<Args>()...) ),
                Result
            >::value
        >::type>
    function(T&& t)
    : fn(new model<typename std::decay<T>::type>(std::forward<T>(t))) {}

    // do the virtual call    
    Result operator()(Args... args) const {
        return (*fn)(std::forward<Args>(args)...);
    }
};

(Note that I overlooked several things for the sake of simplicity: it cannot be copied, and maybe other problems; don't use this code in real code)

Replacing (or re-implementing?) std::function for some statistics and testing

You might do something like:

template <typename Sig>
class CustomWrapper
{
private:
    std::function<Sig> f;
    mutable std::size_t nbCall = 0;
    // ...
public:
    CustomWrapper() = default;

    CustomWrapper(const CustomWrapper&) = default;
    CustomWrapper(CustomWrapper&&) = default;
    CustomWrapper& operator=(const CustomWrapper&) = default;
    CustomWrapper& operator=(CustomWrapper&&) = default;

    template <typename T,
              std::enable_if_t<!std::is_same<CustomWrapper, std::decay_t<T>>::value
                  && std::is_constructible<std::function<Sig>, T&&>::value, bool> = false>
    CustomWrapper(T&& arg) : f(std::forward<T>(arg)){}

    template <typename ... Ts,
              std::enable_if_t<(sizeof...(Ts) >= 2)
                  && std::is_constructible<std::function<Sig>, Ts&&...>::value, bool> = false>
    CustomWrapper(Ts&& ... args) : f(std::forward<Ts>(args)...){}

    template <typename ... Ts>
    auto operator ()(Ts&&... args) const -> decltype(f(std::forward<Ts>(args)...))
    {
        ++nbCall; // Statistics you want
        // ...
        return f(std::forward<Ts>(args)...);
    }

    std::size_t getNbCall() const { return nbCall; }
    // ...
};

Demo

How Is Std::Function Implemented