Will Copy-On-Write Prevent Data Duplication on Arrays

Will copy-on-write prevent data duplication on arrays?

Copy on write as the name suggests means no variable is being copied until something is written; as long as not a single byte is changed in the variable passed around, PHP takes care of avoiding unnecesary duplicates automatically and without the need of using explicit references thanks to this mechanism.

This article explains in detail how is this implemented in the source code of PHP, and as the article suggests, using xdebug one can easily check the variables are not being duplicated with the function xdebug_debug_zval.

Additionally this answer here on SO has more on Copy-on-Write.

PHP return reference to array in other class

$tmpList = &$list->toArray();

Even though you should test the real performance impact as PHP acts unexpected with using references.

Update: Actualy it seems you also have to add the & infront of the function declaration. Got this part wrong.

public function &toArray(){ ... }

PHP Array element assignment for readonly - is value copied?

PHP uses copy-on-write. It attempts to avoid physically copying data unless it needs to.

From PHP docs - Introduction to Variables:

PHP is a dynamic, loosely typed language, that uses copy-on-write and reference counting.

You can test this easily:

/* memory usage helpers */

$mem_initial = memory_get_usage();
$mem_last = $mem_initial;

$mem_debug = function () use ($mem_initial, &$mem_last) {
$mem_current = memory_get_usage();
$mem_change = $mem_current - $mem_last;
echo 'Memory usage change: ', $mem_change >= 0 ? '+' : '-', $mem_change, " bytes\n";
$mem_last = $mem_current;
};

/* test */

echo "Allocating 10kB string\n";
$string = str_repeat('x', 10000);
$mem_debug();
echo "\n";

echo "Copying string by direct assignment\n";
$string2 = $string;
$mem_debug();
echo "\n";

echo "Modyfing copied string\n";
$string2 .= 'x';
$mem_debug();
echo "\n";

echo "Copying string with a (string) cast\n";
$string3 = (string) $string;
$mem_debug();

Output for PHP 5.x:

Allocating 10kB string
Memory usage change: +10816 bytes

Copying string by direct assignment
Memory usage change: +56 bytes

Modyfing copied string
Memory usage change: +10048 bytes

Copying string with a (string) cast
Memory usage change: +10104 bytes
  • direct assignment doesn't copy the string in memory as expected
  • modifying the copied string does duplicate the string in memory - copy-on-write has happened
  • assigning the string with an additional (string) cast seems to duplicate the string in memory even if it is unchanged

Output for PHP 7.0:

Allocating 10kB string
Memory usage change: +13040 bytes

Copying string by direct assignment
Memory usage change: +0 bytes

Modyfing copied string
Memory usage change: +12288 bytes

Copying string with a (string) cast
Memory usage change: +0 bytes
  • copy-on-write behavior is the same as in the 5.x versions but meaningless (string) casts don't cause the string to be duplicated in memory

Passing arrays, without overhead (preferably by reference ), to avoid duplicating complex code blocks, in matlab?

As noted by Loren on his blog, MATLAB does support in-line operations on matrices, which essentially covers passing arrays by reference, modifying them in a function, and returning the result. You seem to know that, but you erroneously state that because the script must contain identical variable names as the calling script. Here is code example that shows this is wrong. When testing, please copy it verbatim and save as a function:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc
end

function x=computeIP(x)
x = x+1;
end

function y=compute(x)
y = x+1;
end

Time results on my computer:

Elapsed time is 0.243335 seconds.
Elapsed time is 0.251495 seconds.
Elapsed time is 0.090949 seconds.
Elapsed time is 0.088894 seconds.
Elapsed time is 0.090638 seconds.

As you see, the two last calls that use an in-place function are equally fast for both input arrays x and y. Also, they are equally fast as running x = x+1 without a function. The only important thing is that inside the function input and output parameters are the same. And there is one more thing...

If I should guess what is wrong with your code, I'd say you made nested functions that you expect to be in-place. And they are not. So the below code will not work:

function inplace_test
y = zeros(1,1e8);
x = zeros(1,1e8);

tic; x = compute(x); toc
tic; y = compute(y); toc
tic; x = computeIP(x); toc
tic; y = computeIP(y); toc
tic; x = x+1; toc

function x=computeIP(x)
x = x+1;
end

function y=compute(x)
y = x+1;
end
end

Elapsed time is 0.247798 seconds.
Elapsed time is 0.257521 seconds.
Elapsed time is 0.229774 seconds.
Elapsed time is 0.237215 seconds.
Elapsed time is 0.090446 seconds.

The bottom line - be careful with those nested functions..

Copy on write for array of records

Dynamic arrays do not support Copy-on-Write (CoW) semantics. It does not matter in you example but it matters in other cases.

If you need to copy the contents of a dynamic array use Copy function. Here is an example demonstrating the difference between dynamic array assignment and copying:

procedure TestCopy;
type
recordA = Record
Y:integer;
end;
arrayA = array of recordA;

var x, b, c: arrayA;
item: recordA;

begin
SetLength(x, 2);
item.Y:= 2;
x[0] := item;
item.Y:= 5;
x[1] := item;

b:= x;
x[0].Y:= 4;
Writeln(b[0].Y, ' -- ', x[0].Y);

b:= Copy(x);
x[0].Y:= 8;
Writeln(b[0].Y, ' -- ', x[0].Y);
end;

What is copy-on-write?

I was going to write up my own explanation but this Wikipedia article pretty much sums it up.

Here is the basic concept:

Copy-on-write (sometimes referred to as "COW") is an optimization strategy used in computer programming. The fundamental idea is that if multiple callers ask for resources which are initially indistinguishable, you can give them pointers to the same resource. This function can be maintained until a caller tries to modify its "copy" of the resource, at which point a true private copy is created to prevent the changes becoming visible to everyone else. All of this happens transparently to the callers. The primary advantage is that if a caller never makes any modifications, no private copy need ever be created.

Also here is an application of a common use of COW:

The COW concept is also used in maintenance of instant snapshot on database servers like Microsoft SQL Server 2005. Instant snapshots preserve a static view of a database by storing a pre-modification copy of data when underlaying data are updated. Instant snapshots are used for testing uses or moment-dependent reports and should not be used to replace backups.

Php function creates new array (data) from input or references to it?

$result = array('a', 'b', 'c');
include(functions.php); //doSomething() resides here
doSomething($result);

and define

function doSomething(&$result) {/* code */}

And not use more memory..



Related Topics



Leave a reply



Submit