What Is a "Regular Type" in the Context of Move Semantics

What is a Regular Type in the context of move semantics?

Summary:

For C++11 I would include:

  • move-ctor (noexcept)
  • move-assign (noexcept)
  • total ordering (operator<() for natural total order and std::less<> if a natural
    total order does not exist).
  • hash<>

And would remove:

  • swap() (non-throwing) - replaced by move operations.

Commentary

Alex revisits the concept of a regular type in Elements of Programming. In fact, much of the book is devoted to regular types.

There is a set of procedures whose inclusion in the computational
basis of a type lets us place objects in data structures and use
algorithms to copy objects from one data structure to another. We call
types having such a basis regular, since their use guarantees
regularity of behavior and, therefore, interoperability. -- Section 1.5 of EoP

In EoP, Alex introduces the notion of an underlying_type which gives us a non-throwing swap algorithm that can be used to move. An underlying_type template isn't implementable in C++ in any particularly useful manner, but you can use non-throwing (noexcept) move-ctor and move-assign as reasonable approximations (an underlying type allows moving to/from a temporary without an additional destruction for the temporary). In C++03, providing a non-throwing swap() was the recommended way to approximate a move operation, if you provide move-ctor and move-assign then the default std::swap() will suffice (though you could still implement a more efficient one).

[ I'm on record as recommending that you use a single assignment operator, passing by value, to cover both move-assign and copy-assign. Unfortunately the current language rules for when a type gets a default move-ctor causes this to break with composite types. Until that is fixed in the language you will need to write two assignment operators. However, you can still use pass by value for other sink arguments to avoid combinatorics in handling move/copy for all arguments. ]

Alex also adds the requirement of total ordering (though there may not be a natural total order and the ordering may be purely representational). operator<() should be reserved for the natural total ordering. My suggestion is to specialize std::less<>() if a natural total ordering is not available, there is some precedent for that in the standard).

In EoP, Alex relaxes the requirements on equality to allow for representational-equality as being sufficient. A useful refinement.

A regular type should also be equationally complete (that is, operator==() should be implementable as a non-friend, non-member, function). A type that is equationally complete is also serializable (though without a canonical serialization format, implementing the stream operators are of little use except for debugging). A type that is equationally complete can also be hashed. In C++11 (or with TR1) you should provide a specialization of std::hash.

Another property of regular types is area() for which there is not yet any standard syntax - and likely little reason to actually implement except for testing. It is a useful concept for specifying complexity - and I frequently implement it (or an approximation) for testing complexity. For example, we define the complexity of copy as bounded by the time to copy the area of the object.

The concept of a regular type is not language-specific. One of the first things I do when presented with a new language is work out how regular types manifest in that language.

Move semantics - what it's all about?

Forget about C++0x for the moment. Move semantics are something that is language independent -- C++0x merely provides a standard way to perform operations with move semantics.

Definition

Move semantics define the behaviour of certain operations. Most of the time they are contrasted with copy semantics, so it would be useful to define them first.

Assignment with copy semantics has the following behaviour:

// Copy semantics
assert(b == c);
a = b;
assert(a == b && b == c);

i.e. a ends up equal to b, and we leave b unchanged.

Assignment with move semantics has weaker post conditions:

// Move semantics
assert(b == c);
move(a, b); // not C++0x
assert(a == c);

Note that there is no longer any guarantee that b remains unchanged after the assignment with move semantics. This is the crucial difference.

Uses

One benefit of move semantics is that it allows optimisations in certain situations. Consider the following regular value type:

struct A { T* x; };

Assume also that we define two objects of type A to be equal iff their x member point to equal values.

bool operator==(const A& lhs, const A& rhs) { return *lhs.x == *rhs.x; }

Finally assume that we define an object A to have sole ownership over the pointee of their x member.

A::~A() { delete x; }
A::A(const A& rhs) : x(new T(rhs.x)) {}
A& A::operator=(const A& rhs) { if (this != &rhs) *x = *rhs.x; }

Now suppose we want to define a function to swap two A objects.

We could do it the normal way with copy semantics.

void swap(A& a, A& b)
{
A t = a;
a = b;
b = t;
}

However, this is unnecessarily inefficient. What are we doing?

  • We create a copy of a into t.
  • We then copy b into a.
  • Then copy t into b.
  • Finally, destroy t.

If T objects are expensive to copy then this is wasteful. If I asked you to swap two files on your computer, you wouldn't create a third file then copy and paste the file contents around before destroying your temporary file, would you? No, you'd move one file away, move the second into the first position, then finally move the first file back into the second. No need to copy data.

In our case, it's easy to move around objects of type A:

// Not C++0x
void move(A& lhs, A& rhs)
{
lhs.x = rhs.x;
rhs.x = nullptr;
}

We simply move rhs's pointer into lhs and then relinquish rhs ownership of that pointer (by setting it to null). This should illuminate why the weaker post condition of move semantics allows optimisations.

With this new move operation defined, we can define an optimised swap:

void swap(A& a, A& b)
{
A t;
move(t, a);
move(a, b);
move(b, t);
}

Another advantage of move semantics is that it allows you to move around objects that are unable to be copied. A prime example of this is std::auto_ptr.

C++0x

C++0x allows move semantics through its rvalue reference feature. Specifically, operations of the kind:

a = b;

Have move semantics when b is an rvalue reference (spelt T&&), otherwise they have copy semantics. You can force move semantics by using the std::move function (different from the move I defined earlier) when b is not an rvalue reference:

a = std::move(b);

std::move is a simple function that essentially casts its argument to an rvalue reference. Note that the results of expressions (such as a function call) are automatically rvalue references, so you can exploit move semantics in those cases without changing your code.

To define move optimisations, you need to define a move constructor and move assignment operator:

T::T(T&&);
T& operator=(T&&);

As these operations have move semantics, you are free to modify the arguments passed in (provided you leave the object in a destructible state).

Conclusion

That's essentially all there is to it. Note that rvalue references are also used to allow perfect forwarding in C++0x (due to the specifically crafted type system interactions between rvalue references and other types), but this isn't really related to move semantics, so I haven't discussed it here.

Understanding Move Semantics of C++ Standard

When we move an object with by std::move

No. std::move doesn't move things. It gives us an rvalue expression referring to some object. An rvalue expression results in overload resolution picking functions that take rvalue references, which is convenient because now we will pick those instead of the functions that take values, or lvalue references, which traditionally do copy-like things. Such functions tend to be constructors (so-called "move constructors") or assignment operators (so-called "move assignment operators") because it is during construction and assignment that moving is useful.

it makes the object nullptr

That depends entirely on what said function does. Typically, an object that's worth moving has indirect state, like a pointer to some dynamically-allocated resource. And if said object is moveable, its move constructor is best off doing a swap on those pointers. If it didn't leave the source object's pointer as nullptr then you have two objects owning the resource. That wasn't a move; it was a [shallow] copy.

A full discussion of the hows and the whys is out of scope of a Q&A, but any good book should have one.

Also, if this all sounds like a hack, that's because it is one. C++ is a hodge-podge of hacks built on hacks to provide additional functionality over time. In this case, we needed (read: wanted) a way to create a different kind of constructor (and assignment operator), a kind that would take a non-const reference and could modify the source object (to "steal" its resources), and to achieve that we had to introduce a new kind of reference that would only bind to rvalues (because we were already using everything else for copies), and then we had to introduce a utility function that would turn the name of an object into an rvalue … hence, std::move was born, the utility function that doesn't move anything. /p>

Why have move semantics?

Your example gives it away: your code is not exception-safe, and it makes use of the free-store (twice), which can be nontrivial. To use pointers, in many/most situations you have to allocate stuff on the free store, which is much slower than automatic storage, and does not allow for RAII.

They also let you more efficiently represent non-copyable resources, like sockets.

Move semantics aren't strictly necessary, as you can see that C++ has existed for 40 years a while without them. They are simply a better way to represent certain concepts, and an optimization.

What's the connection between value semantics and move semantics in C++?

From the original move proposal:

Copy vs Move


C and C++ are built on copy semantics. This is a Good Thing. Move
semantics is not an attempt to supplant copy semantics, nor undermine
it in any way. Rather this proposal seeks to augment copy semantics. A
general user defined class might be both copyable and movable, one or
the other, or neither.

The difference between a copy and a move is that a copy leaves the
source unchanged. A move on the other hand leaves the source in a
state defined differently for each type. The state of the source may
be unchanged, or it may be radically different. The only requirement
is that the object remain in a self consistent state (all internal
invariants are still intact). From a client code point of view,
choosing move instead of copy means that you don't care what happens
to the state of the source.

For PODs, move and copy are identical operations (right down to the
machine instruction level).

I guess one could add to this and say:

Move semantics allows us to keep value semantics, but at the same time gain the performance of reference semantics in those cases where the value of the original (copied-from) object is unimportant to program logic.

What is std::move(), and when should it be used?

Wikipedia Page on C++11 R-value references and move constructors

  1. In C++11, in addition to copy constructors, objects can have move constructors.

    (And in addition to copy assignment operators, they have move assignment operators.)
  2. The move constructor is used instead of the copy constructor, if the object has type "rvalue-reference" (Type &&).
  3. std::move() is a cast that produces an rvalue-reference to an object, to enable moving from it.

It's a new C++ way to avoid copies. For example, using a move constructor, a std::vector could just copy its internal pointer to data to the new object, leaving the moved object in an moved from state, therefore not copying all the data. This would be C++-valid.

Try googling for move semantics, rvalue, perfect forwarding.

What is move semantics?

I find it easiest to understand move semantics with example code. Let's start with a very simple string class which only holds a pointer to a heap-allocated block of memory:

#include <cstring>
#include <algorithm>

class string
{
char* data;

public:

string(const char* p)
{
size_t size = std::strlen(p) + 1;
data = new char[size];
std::memcpy(data, p, size);
}

Since we chose to manage the memory ourselves, we need to follow the rule of three. I am going to defer writing the assignment operator and only implement the destructor and the copy constructor for now:

    ~string()
{
delete[] data;
}

string(const string& that)
{
size_t size = std::strlen(that.data) + 1;
data = new char[size];
std::memcpy(data, that.data, size);
}

The copy constructor defines what it means to copy string objects. The parameter const string& that binds to all expressions of type string which allows you to make copies in the following examples:

string a(x);                                    // Line 1
string b(x + y); // Line 2
string c(some_function_returning_a_string()); // Line 3

Now comes the key insight into move semantics. Note that only in the first line where we copy x is this deep copy really necessary, because we might want to inspect x later and would be very surprised if x had changed somehow. Did you notice how I just said x three times (four times if you include this sentence) and meant the exact same object every time? We call expressions such as x "lvalues".

The arguments in lines 2 and 3 are not lvalues, but rvalues, because the underlying string objects have no names, so the client has no way to inspect them again at a later point in time.
rvalues denote temporary objects which are destroyed at the next semicolon (to be more precise: at the end of the full-expression that lexically contains the rvalue). This is important because during the initialization of b and c, we could do whatever we wanted with the source string, and the client couldn't tell a difference!

C++0x introduces a new mechanism called "rvalue reference" which, among other things,
allows us to detect rvalue arguments via function overloading. All we have to do is write a constructor with an rvalue reference parameter. Inside that constructor we can do anything we want with the source, as long as we leave it in some valid state:

    string(string&& that)   // string&& is an rvalue reference to a string
{
data = that.data;
that.data = nullptr;
}

What have we done here? Instead of deeply copying the heap data, we have just copied the pointer and then set the original pointer to null (to prevent 'delete[]' from source object's destructor from releasing our 'just stolen data'). In effect, we have "stolen" the data that originally belonged to the source string. Again, the key insight is that under no circumstance could the client detect that the source had been modified. Since we don't really do a copy here, we call this constructor a "move constructor". Its job is to move resources from one object to another instead of copying them.

Congratulations, you now understand the basics of move semantics! Let's continue by implementing the assignment operator. If you're unfamiliar with the copy and swap idiom, learn it and come back, because it's an awesome C++ idiom related to exception safety.

    string& operator=(string that)
{
std::swap(data, that.data);
return *this;
}
};

Huh, that's it? "Where's the rvalue reference?" you might ask. "We don't need it here!" is my answer :)

Note that we pass the parameter that by value, so that has to be initialized just like any other string object. Exactly how is that going to be initialized? In the olden days of C++98, the answer would have been "by the copy constructor". In C++0x, the compiler chooses between the copy constructor and the move constructor based on whether the argument to the assignment operator is an lvalue or an rvalue.

So if you say a = b, the copy constructor will initialize that (because the expression b is an lvalue), and the assignment operator swaps the contents with a freshly created, deep copy. That is the very definition of the copy and swap idiom -- make a copy, swap the contents with the copy, and then get rid of the copy by leaving the scope. Nothing new here.

But if you say a = x + y, the move constructor will initialize that (because the expression x + y is an rvalue), so there is no deep copy involved, only an efficient move.
that is still an independent object from the argument, but its construction was trivial,
since the heap data didn't have to be copied, just moved. It wasn't necessary to copy it because x + y is an rvalue, and again, it is okay to move from string objects denoted by rvalues.

To summarize, the copy constructor makes a deep copy, because the source must remain untouched.
The move constructor, on the other hand, can just copy the pointer and then set the pointer in the source to null. It is okay to "nullify" the source object in this manner, because the client has no way of inspecting the object again.

I hope this example got the main point across. There is a lot more to rvalue references and move semantics which I intentionally left out to keep it simple. If you want more details please see my supplementary answer.

C++ Move Semantics vs Copy Constructor and Assignment Operator in relation to Smart Pointers

As the name indicates, use unique_ptr when there must exist exactly one owner to a resource. The copy constructor of unique_ptr is disabled, which means it is impossible for two instances of it to exist. However, it is movable... Which is fine, since that allows transfer of ownership.

Also as the name indicates, shared_ptr represents shared ownership of a resource. However, there is also another difference between the two smart pointers: The Deleter of a unique_ptr is part of its type signature, but it is not part of the type signature of shared_ptr. That is because shared_ptr uses "type erasure" to "erase the type" of the deleter. Also note that shared_ptr can also be moved to transfer ownership (like unique_ptr.)

When should I use move semantics?

Although shared_ptr can be copied, you may want to move them anyways when you are making a transfer of ownership (as opposed to creating a new reference). You're obligated to use move semantics for unique_ptr, since ownership must be unique.

When should I use copy semantics?

In the case of smart pointers, you should use copying to increase the reference count of shared_ptrs. (If you're unfamiliar with the concept of a reference count, research reference counted garbage collection.)

Should I ever use both?

Yes. As mentioned above, shared_ptr can be both copied and moved. Copying denotes incrementing the reference count, while moving only indicates a transfer of ownership (the reference count stays the same.)

Should I ever use none and rely on the default copy constructor and assignment operator?

When you want to make a member-by-member copy of an object.

C++ vector move semantics

From what I understand, if I wanted to use Move Semantics efficiently, I would need to have a pointer to my vector of children to std::move that pointer effortlessly, am I correct in saying that if my class has no pointers, it is useless to std::swap?

No that's not true in general, the standard library container classes will have move constructors that do what you want as long as you know how to invoke them, which brings us to the second part of your question:

Does std::swap do this in my case?

I know it can be really frustrating to get the answer "it depends", but as with most things in C++, it really depends.

To answer your question in the most general way possible, yes std::swap() will probably do what you want most of the time, especially if you're just working with standard library container classes. Where things get weird (and where I don't have enough information to give you a complete answer) is that you've defined your own class, and only part of it is shown here. The devil is in the details, so the actual behavior of the program will depend on what's in those ellipses.

In general when you're trying to understand what to expect with copy/move behavior, you really need to think in terms of constructors. Assuming you're using a "modern" (i.e. post-11 version) of C++, the std::swap() function is going to look something roughly like this:

template<typename T> void swap(T& t1, T& t2) {
T temp = std::move(t1); // or T temp(std::move(t1));
t1 = std::move(t2);
t2 = std::move(temp);
}

See also this related post. More concretely for your example, the template instantiation will look something like this:

void swap(JSON& t1, JSON& t2) {
JSON temp = std::move(t1); // or T temp(std::move(t1));
t1 = std::move(t2);
t2 = std::move(temp);
}

Keep in mind that std::move() is really just a fancy way to cast an lvalue reference to an rvalue reference with some corner case handling. The function itself doesn't do anything, it's a means to tell the compiler how to perform overload resolution.

So now the question becomes: what happens when the compiler needs to construct a JSON object from an rvalue reference to an object type JSON? The answer to this question depends on what constructors are available on the class, some of which may be implicitly generated by the compiler. See also this post.

The compiler will pick the best fitting constructor for the operation, which could be an implicit one, and depending on what you've declared on class, may not actually be a move constructor as explained in this example. To stay away from falling into that trap, you need to know that an rvalue reference can bind to a const lvalue reference, so a copy constructor with the following signature:

    JSON(const JSON &);

Is a valid overload candidate for the left hand side of std::move() operation in some cases. This is probably why you sometimes hear people saying that std::move() "isn't actually moving anything", or it's "still just copying".

So where does all of this leave your code? Basically if you have no user-declared constructors, and you're letting the compiler do it for you, then std::swap is probably going to move memory on all of your members the way you want. As soon as you start declaring your own constructors, the story gets more complicated and we have to talk specifics.

As a small postscript here, do you really need to use swap() at all? It looks like you're just trying to construct a shared_ptr to an object that's been initialized with the contents of another object. This would probably be a slightly simpler approach:

  std::shared_ptr<const JSON> outPtr = std::make_shared<JSON>(std::move(json["data"]));

This will construct an object of type JSON using a move constructor (assuming it's the best overload candidate given the caveats I mentioned) and return a shared_ptr to it.



Related Topics



Leave a reply



Submit