In PHP Can Someone Explain Cloning VS Pointer Reference

In PHP can someone explain cloning vs pointer reference?

Basically, there are two ways variables work in PHP...

For everything except objects:

  1. Assignment is by value (meaning a copy occurs if you do $a = $b.
  2. Reference can be achieved by doing $a = &$b (Note the reference operator operates upon the variable, not the assignment operator, since you can use it in other places)...
  3. Copies use a copy-on-write tehnique. So if you do $a = $b, there is no memory copy of the variable. But if you then do $a = 5;, the memory is copied then and overwritten.

For objects:

  1. Assignment is by object reference. It's not really the same as normal variable by reference (I'll explain why later).
  2. Copy by value can be achieved by doing $a = clone $b.
  3. Reference can be achieved by doing $a = &$b, but beware that this has nothing to do with the object. You're binding the $a variable to the $b variable. It doesn't matter if it's an object or not.

So, why is assignment for objects not really reference? What happens if you do:

$a = new stdclass();
$b = $a;
$a = 4;

What's $b? Well, it's stdclass... That's because it's not writing a reference to the variable, but to the object...

$a = new stdclass();
$a->foo = 'bar';
$b = $a;
$b->foo = 'baz';

What's $a->foo? It's baz. That's because when you did $b = $a, you are telling PHP to use the same object instance (hence the object reference). Note that $a and $b are not the same variable, but they do both reference the same object.

One way of thinking about it, is to think of all variables which store an object as storing the pointer to that object. So the object lives somewhere else. When you assign $a = $b where $b is an object, all you're doing is copying that pointer. The actual variables are still disjoint. But when you do $a = &$b, you're storing a pointer to $b inside of $a. Now, when you manipulate $a it cascades the pointer chain to the base object. When you use the clone operator, you're telling PHP to copy the existing object, and create a new one with the same state... So clone really just does a by-value copy of the varaible...

So if you noticed, I said the object is not stored in an actual variable. It's stored somewhere else and nothing but a pointer is stored in the variable. So this means that you can have (and often do have) multiple variables pointing to the same instance. For this reason, the internal object representation contains a refcount (Simply a count of the number of variables pointing to it). When an object's refcount drops to 0 (meaning that all the variables pointing to it either go out of scope, or are changed to somethign else) it is garbaged collected (as it is no longer accessable)...

You can read more on references and PHP in the docs...

Disclaimer: Some of this may be oversimplification or blurring of certain concepts. I intended this only to be a guide to how they work, and not an exact breakdown of what goes on internally...

Edit: Oh, and as for this being "clunky", I don't think it is. I think it is really useful. Otherwise you'd have variable references being passed around all over the place. And that can yield some really interesting bugs when a variable in one part of an application affects another variable in another part of the app. And not because it's passed, but because a reference was made somewhere along the line.

In general, I don't use variable references that much. It's rare that I find an honest need for them. But I do use object references all the time. I use them so much, that I'm happy that they are the default. Otherwise I'd need to write some operator (since & denotes a variable reference, there'd need to be another to denote an object reference). And considering that I rarely use clone, I'd say that 99.9% of use cases should use object references (so make the operator be used for the lower frequency cases)...

JMHO

I've also created a video explaining these differences. Check it out on YouTube.

Help me understand PHP variable references and scope

If I pass a variable to a function (e.g. $var), is that supposed to be a copy of a reference to the actual variable (such that setting it null doesn't affect other copies)?

Depends on the function. And also how you call it. Look at this example:
http://www.ideone.com/LueFc

Or is it receiving a reference to what is a new copy of the actual variable (such that setting it to null destroys its copy only)?

Again depends on the function

If the latter, does this copy objects and arrays in memory? That seems like a good way to waste memory and CPU time, if so.

Its going to save memory to use a reference, certainly. In php>4 it always uses reference for objects unless you specify otherwise.

What's the deal with local scope? Am I right in observing that I can declare an array in one function and then use that array in other functions called within that function WITHOUT passing it to them as a parameter?

No you can't.

Similarly, does declaring in array in a function called within a function allow it to be available in the caller?

No, it doesn't.

If not, does scoping work by a call stack or whatever like every bloody thing I've come to understand about programming tells me it should?

If you want to use a variable from outside the function, before using it, you'd write global $outsidevar

PHP Object Assignment vs Cloning

Objects are abstract data in memory. A variable always holds a reference to this data in memory. Imagine that $foo = new Bar creates an object instance of Bar somewhere in memory, assigns it some id #42, and $foo now holds this #42 as reference to this object. Assigning this reference to other variables by reference or normally works the same as with any other values. Many variables can hold a copy of this reference, but all point to the same object.

clone explicitly creates a copy of the object itself, not just of the reference that points to the object.

$foo = new Bar;   // $foo holds a reference to an instance of Bar
$bar = $foo; // $bar holds a copy of the reference to the instance of Bar
$baz =& $foo; // $baz references the same reference to the instance of Bar as $foo

Just don't confuse "reference" as in =& with "reference" as in object identifier.

$blarg = clone $foo;  // the instance of Bar that $foo referenced was copied
// into a new instance of Bar and $blarg now holds a reference
// to that new instance

Using & to get object's reference in PHP is redundant?

In php5, yes, it is redundant and pointless.

Cloning objects that contain objects

I believe the accepted way is to serialize and unserialize the composite object

$d = unserialize(serialize($a));

What is the difference between a deep copy and a shallow copy?

Shallow copies duplicate as little as possible. A shallow copy of a collection is a copy of the collection structure, not the elements. With a shallow copy, two collections now share the individual elements.

Deep copies duplicate everything. A deep copy of a collection is two collections with all of the elements in the original collection duplicated.

PHP object arguments behaviour

Because in PHP 5, references to objects are passed by value, as opposed to the objects themselves. That means your function argument $var and your calling-scope variable $obj are distinct references to the same object. This manual entry may help you.

To obtain a (shallow) copy of your object, use clone. In order to retrieve this copy, though, you need to return it:

function edit($var)
{
$clone = clone $var;
$clone->test = "foo";
return $clone;
}

$obj = new stdClass;
$obj2 = edit($obj);

echo $obj2->test;

Or assign it to a reference argument, then call it like so:

function edit($var, &$clone)
{
$clone = clone $var;
$clone->test = "foo";
}

$obj = new stdClass;
edit($obj, $obj2);

echo $obj2->test;

Issue with cloning and pass-by-reference

TL;DR

This is a classic case of PHP SNAFU. I will explain how and why it happens, but unfortunately as far as I can tell there is no possible solution that is satisfactory.

A brittle solution exists if you can run code before PHP shallow clones the object (e.g. by writing your own cloning method), but not if the code runs afterwards, which is how __clone works. However, this solution can still fail for other reasons outside your control.

There is also another option that is safe which involves a well-known "cloning" trick, but it also has drawbacks: it only works on data that is serializable, and it doesn't allow you to keep any references inside that data around even if you want to.

At the end of the day if you want to keep your sanity you will have to move away from implementing the properties $this->varN as references.

The plight of the poor PHP developer

Normally you would have to deep clone everything that needs to be cloned inside __clone. Then you would also have to reassign any references that still point to the instances that were just deep cloned.

You would think these two steps should be enough, for example:

public function __construct()
{
$this->data = new stdClass;
$this->data->var1 = 'a';
$this->data->var2 = 'b';
$this->data->var3 = 'c';
$this->assignReferences();
}

public function __clone()
{
$this->data = clone $this->data;
$this->assignReferences();
}

private function assignReferences()
{
$this->var1 = &$this->data->var1;
$this->var2 = &$this->data->var2;
$this->var3 = &$this->data->var3;
}

However, this does not work. How can that be?

Zend engine references

If you var_dump($this->data) before and after assignReferences() in the constructor you will see that assigning those references causes the contents of $this->data to become references themselves.

This is an artifact of how references are internally implemented in PHP and there is nothing you can do about it directly. What you can do is convert them back to normal values first by losing all other references to them, after which cloning as above would work.

In code:

public function __construct()
{
$this->data = new stdClass;
$this->data->var1 = 'a';
$this->data->var2 = 'b';
$this->data->var3 = 'c';
$this->assignReferences();
}

public function makeClone()
{
unset($this->var1); // turns $this->data->var1 into non-reference
unset($this->var2); // turns $this->data->var2 into non-reference
unset($this->var3); // turns $this->data->var3 into non-reference

$clone = clone $this; // this code is the same
$clone->data = clone $clone->data; // as what would go into
$clone->assignReferences(); // __clone() normally

$this->assignReferences(); // undo the unset()s
return $clone;
}

private function assignReferences()
{
$this->var1 = &$this->data->var1;
$this->var2 = &$this->data->var2;
$this->var3 = &$this->data->var3;
}

This appears to work, but it's immediately not very satisfactory because you have to know that the way to clone this object is $obj->makeClone() instead of clone $obj -- the natural approach will fail.

However, there is also a more insidious bug here waiting to bite you: to un-reference the values inside $this->data you have to lose all references to them in the program. The code above does so for the references in $this->varN, but what about references other code might have created?

Compare this:

$original = new my_class;
$new = $original->makeClone();
$new->var3 = 'd';

echo $original->var3; // works, "c"

To this:

$original = new my_class;
$oops = &$original->var3; // did you think this might be a problem?
$new = $original->makeClone();
$new->var3 = 'd';

echo $original->var3; // doesn't work!

We are now back to square one. And worse, there is no way to prevent someone from doing this and messing up your program.

Kill the references with fire

There is a guaranteed way to make the references inside $this->data go away no matter what: serialization.

public function __construct()
{
$this->data = new stdClass;
$this->data->var1 = 'a';
$this->data->var2 = 'b';
$this->data->var3 = 'c';
$this->assignReferences();
}

public function __clone()
{
$this->data = unserialize(serialize($this->data)); // instead of clone
$this->assignReferences();
}

This works with the values in question, but it also has drawbacks:

  1. You cannot have any values (recursively) inside $this->data that are not serializable.
  2. It will indiscriminately kill all references inside $this->data -- even those you might want to preserve on purpose.
  3. It's less performant (a theoretical point, to be fair).

So what to do?

After the obligatory bashing of PHP, follow the classic doctor's advice: if it hurts when you do something, then don't do it.

In this case this means that you just can't expose the contents of $this->data through public properties (references) on the object. Instead of this use getter functions or possibly implement the magic __get.



Related Topics



Leave a reply



Submit