JSON VS. Serialized Array in Database

JSON vs. Serialized Array in database

  1. JSON encode() & decode()

    • PHP Version >= 5.0.0

      • Nesting Limit of 20.
    • PHP Version >= 5.2.3

      • Nesting Limit of 128.
    • PHP Version >= 5.3.0

      • Nesting Limit of 512.
    • Small footprint vs PHP's serialize'd string.
  2. serialize() & unserialize()

    • PHP Version >= 4.0.0

      • Methods are not lost on PHP Datatype Object.
      • __wakeup() magic method called on any object being unserialize. (VERY POWERFUL)
      • It has been noted that it is some times best the base64 encode strings put into the database, and base64 decode strings taken out of the database with this function, as there are some issues with the handling of some white space characters.

The choice is yours.

Preferred method to store PHP arrays (json_encode vs serialize)

Depends on your priorities.

If performance is your absolute driving characteristic, then by all means use the fastest one. Just make sure you have a full understanding of the differences before you make a choice

  • Unlike serialize() you need to add extra parameter to keep UTF-8 characters untouched: json_encode($array, JSON_UNESCAPED_UNICODE) (otherwise it converts UTF-8 characters to Unicode escape sequences).
  • JSON will have no memory of what the object's original class was (they are always restored as instances of stdClass).
  • You can't leverage __sleep() and __wakeup() with JSON
  • By default, only public properties are serialized with JSON. (in PHP>=5.4 you can implement JsonSerializable to change this behavior).
  • JSON is more portable

And there's probably a few other differences I can't think of at the moment.

A simple speed test to compare the two

<?php

ini_set('display_errors', 1);
error_reporting(E_ALL);

// Make a big, honkin test array
// You may need to adjust this depth to avoid memory limit errors
$testArray = fillArray(0, 5);

// Time json encoding
$start = microtime(true);
json_encode($testArray);
$jsonTime = microtime(true) - $start;
echo "JSON encoded in $jsonTime seconds\n";

// Time serialization
$start = microtime(true);
serialize($testArray);
$serializeTime = microtime(true) - $start;
echo "PHP serialized in $serializeTime seconds\n";

// Compare them
if ($jsonTime < $serializeTime) {
printf("json_encode() was roughly %01.2f%% faster than serialize()\n", ($serializeTime / $jsonTime - 1) * 100);
}
else if ($serializeTime < $jsonTime ) {
printf("serialize() was roughly %01.2f%% faster than json_encode()\n", ($jsonTime / $serializeTime - 1) * 100);
} else {
echo "Impossible!\n";
}

function fillArray( $depth, $max ) {
static $seed;
if (is_null($seed)) {
$seed = array('a', 2, 'c', 4, 'e', 6, 'g', 8, 'i', 10);
}
if ($depth < $max) {
$node = array();
foreach ($seed as $key) {
$node[$key] = fillArray($depth + 1, $max);
}
return $node;
}
return 'empty';
}

PHP: json_encode vs serialize for storing in a MySQL database?

Found this in the PHP docs...

function mb_unserialize($serial_str) { 
$out = preg_replace('!s:(\d+):"(.*?)";!se', "'s:'.strlen('$2').':\"$2\";'", $serial_str );
return unserialize($out);
}

I don't quite understand it, but it worked to unserialize the data that I couldn't unserialize before. Moved to JSON now, i'll report in a couple of weeks whether this solved the problem of randomly getting some records "corrupted"

Storing arrays in database : JSON vs. serialized array

You can store Arrays and Hashes using ActiveRecord's serialize declaration:

class Comment < ActiveRecord::Base
serialize :stuff
end

comment = Comment.new # stuff: nil
comment.stuff = ['some', 'stuff', 'as array']
comment.save
comment.stuff # => ['some', 'stuff', 'as array']

You can specify the class name that the object type should equal to (in this case Array). This is more explicit and a bit safer. You also won't have to create the array when you assign the first value, since you'll be able to append to the existing (empty) array.

class Comment < ActiveRecord::Base
serialize :stuff, Array
end

comment = Comment.new # stuff: []
comment.stuff << 'some' << 'stuff' << 'as array'

You can even use a neater version called store: http://api.rubyonrails.org/classes/ActiveRecord/Store.html

This should handle your use case using a built in method.

Serialize or json in PHP?

Main advantage of serialize : it's specific to PHP, which means it can represent PHP types, including instances of your own classes -- and you'll get your objects back, still instances of your classes, when unserializing your data.


Main advantage of json_encode : JSON is not specific to PHP : there are libraries to read/write it in several languages -- which means it's better if you want something that can be manipulated with another language than PHP.

A JSON string is also easier to read/write/modify by hand than a serialized one.

On the other hand, as JSON is not specific to PHP, it's not aware of the stuff that's specific to PHP -- like data-types.


As a couple of sidenotes :

  • Even if there is a small difference in speed between those two, it shouldn't matter much : you will probably not serialize/unserialize a lot of data
  • Are you sure this is the best way to store data in a database ?

    • You won't be able to do much queries on serialized strins, in a DB : you will not be able to use your data in where clauses, nor update it without the intervention of PHP...

JSON or XML or serialized array to save in MySQL database

For performance reasons, we store large hashes as serialized Ruby objects in Marshal format. You need a column type of Blob. This works really well. JSON would be fine but we found it a little slower to marshal / un-marshal. I'd stay away from XML unless you really need interoperability with a third party/

what's the point of serializing arrays to store them in the db?

I've not seen this a whole lot. But it's clearly done for implementation ease. Serializing data allows to store quasi binary data.

Your second example is a CSV scheme. This is workable for storing confined string lists. While it's easier to query or even modify within the database, it makes more effort for unmarshalling from/to the database API. Also there is really only limited list support anyway. http://dev.mysql.com/doc/refman/5.1/en/string-functions.html#function_find-in-set

However, true, the serialized data is unneeded in your example. It's only a requirement if you need to store complex or nested array structures. And in such cases the data blob is seldomly accessed or queried within the database.

Using HTTP Session VS serialized JSON array to store an SQL result

Traditionally people use Memcache. It's an in memory cache system to store key value pairs.
http://php.net/manual/en/book.memcache.php

ORM vs serialized array?

My experience is that because the Storage layer for the ORM expects the storage data to be dynamic and therefore holds no preconceptions about its format, it can be better equipped to deal with the exceptions to the norm. (Not actual error exceptions but cases where your object doesn't match your Database Schema)

When you're dealing with Dynamic objects, the rigidity enforced by classic storage generally forces you to either handle the exceptions to norm or create a database schema so loose, that using it defeats the optimisations generally granted by the Database engine; think computed stats and various indices.

However, ultimately I think you've hit the nail on the head in your second paragraph: If it won't normalise then you will have trouble representing it into a schema that's good enough for your database to work with efficiently.

Sure you could serialise the entire object to an array and store that, but you lose the potency of good indexing, full text search and being able to cross-reference objects without having to do multiple reads.

An example for the above, say your DB is modelling ecommerce orders and you want to find subsequent purchase orders to the initial one. The database would need to know how to read each serialised item to find it's parentId property and then rescan the table for matches.

Long story short, ORMs are an answer to a problem that's been extant since Object Orientated programming was dreamed about - Don't worry about it and use them, unless you're sure that your data structures/schema are rigid and sensible to SQL.



Related Topics



Leave a reply



Submit