In PHP, Which Is Faster: Preg_Split or Explode

In PHP, which is faster: preg_split or explode?

Explode is faster, per PHP.net

Tip If you don't need the power of regular expressions, you can choose faster (albeit simpler) alternatives like explode() or str_split().

split vs. explode in php

explode splits strings.

split (replaced with mb_split in newer versions of PHP) also does this, except it has support for splitting using regular expressions.

preg_split also does this and is 25-50% faster and has support for much more powerful Perl-compatible regular expressions.

What is the difference between split() and explode()?

It's been deprecated because

  • explode() is substantially faster because it doesn't split based on a regular expression, so the string doesn't have to be analyzed by the regex parser
  • preg_split() is faster and uses PCRE regular expressions for regex splits

join() and implode() are aliases of each other and therefore don't have any differences.

Using explode, split, or preg_split to store and get multiple database entries

If I understand your question correctly, if you're dealing with comma delimited strings of ID numbers, it would probably be simplest to keep them in this format. The reason is because you could use it in your SQL statement when querying the database.

I'm assuming that you want to run a SELECT query to grab the users whose IDs have been entered, correct? You'd want to use a SELECT ... WHERE IN ... type of statement, like this:

// Get the ids the user submitted
$ids = $_POST['ids'];
// perform some sanitizing of $ids here to make sure
// you're not vulnerable to an SQL injection
$sql = "SELECT * FROM users WHERE ID IN ($ids)";
// execute your SQL statement

Alternatively, you could use explode to create an array of each individual ID, and then loop through so you could do some checking on each value to make sure it's correct, before using implode to concatenate them back together into a string that you can use in your SELECT ... WHERE IN ... statement.

Edit: Sorry, forgot to add: in terms of storing the list of user ids in the database, you could consider either storing the comma delimited list as a string against a message id, but that has drawbacks (difficult to do JOINS on other tables if you needed to). Alternatively, the better option would be to create a lookup type table, which basically consists of two columns: messageid, userid. You could then store each individual userid against the messageid e.g.

messageid | userid
1 | 1
1 | 3
1 | 5

The benefit of this approach is that you can then use this table to join other tables (maybe you have a separate message table that stores details of the message itself).

Under this method, you'd create a new entry in the message table, get the id back, then explode the userids string into its separate parts, and finally create your INSERT statement to insert the data using the individual ids and the message id. You'd need to work out other mechanisms to handle any editing of the list of userids for a message, and deletion as well.

Hope that made sense!

Fastest method of splitting string into an array in PHP

Are you required to use preg_split()? Because it's easier to use preg_match_all():

preg_match_all('/(?:^|:)([^*:]+(?:\*.[^*:]+)*)/', $subject, $result, PREG_PATTERN_ORDER);
$result = $result[1];

PHP split to preg_split()

preg_split() is similar to the old ereg-function split(). You only have to enclose the regex in /.../ like so:

preg_split('/www/', 'D:/Projects/job.com/www/www/path/source', 2);

The enclosing slashes / here are really part of the regular expression syntax, not searched for in the string. If the www delimiter is variable, you should additionally use preg_quote() for the inner part.

But note that you don't need regular expressions if you only look for static strings anyway. In such cases you can use explode() pretty much like you used split() before:

explode('www', 'D:/Projects/job.com/www/www/path/source', 2);

Converting comma separated list to an array - explode vs preg_split

I always like to try and point out that the correctness of a solution always takes priority over how fast it works. Something that doesn't work but is really fast is just as much of a problem as something that works, but is really slow.

So I'll address both the correctness of the solution as well as its efficiency separately.

Correctness

A combination of explode() and trim() in conjunction with array_map(), works nicely to achieve your desired goal here.

$cityNamesArray = array_map('trim', explode(',', $cityNames ));

You can also throw in array_filter() here to make sure zero-length strings don't pass through. So in a string like "Chicago, San Diego, El Paso,, New York," you wouldn't get an array with some empty values.

$cityNamesArray = array_filter(array_map('trim', explode(',', $cityNames )), 'strlen');

This assumes the data can be inconsistent and breaking has a detrimental effect on the desired end-result. So the correctness of the solution with stands to that effect.

The combination of function calls here cause the array to iterated several times so you have O(n * 2 + k) time where k is the number characters in the string to seek for delimitation and n is the number of elements in the resulting array passed through array_map and array_filter.

Speed

Now to think how to make it faster, we need to get the big O down closer to O(k) for the most optimal solution, because you can't reduce k any further with a single character needle/haystack substring search.

The preg_split('/\s*,\s*/', $cityNames, -1, PREG_SPLIT_NO_EMPTY) approach has about O(k) time complexity because it's unlikely to be more than O(k + 1) or worst case O(k + log k) if more than a single pass in the PCRE VM.

It also works correctly on the aforementioned case where $cityNames = "Chicago, San Diego, El Paso,, New York," or some similar result.

This means that it meets both the criteria for correctness and efficiency. Thus I would say it is the optimal solution.



Bench Marking

With that said, I think you'll find that the performance differences between the two approaches are fairly negligible.

Here's a rudimentary bench mark to demonstrate just how negligible the differences are on the average input.

$cityNames = "Chicago, San Diego,El Paso,,New York,  ,"; // sample data

$T = 0; // total time spent

for($n = 0; $n < 10000; $n++) {
$t = microtime(true); // start time
preg_split('/\s*,\s*/', $cityNames, -1, PREG_SPLIT_NO_EMPTY);
$t = microtime(true) - $t; // end time
$T += $t; // aggregate time
}

printf("preg_split took %.06f seconds on average", $T / $n);

$T = 0; // total time spent

for($n = 0; $n < 10000; $n++) {
$t = microtime(true); // start time
array_filter(array_map('trim', explode(',', $cityNames )), 'strlen');
$t = microtime(true) - $t; // end time
$T += $t; // aggregate time
}

printf("array functions took %.06f seconds on average", $T / $n);

preg_split took 0.000003 seconds on average
array functions took 0.000005 seconds on average

This is an average difference of maybe 1 or 2 microseconds between them. When measuring such minute differences in speed you really shouldn't care too much as long as the solution yields correctness. The better way to account for performance problems is to measure in orders of magnitude. A solution that's 1 or 2 microseconds faster isn't worth exploring if it costs more time to get to than just using the existing solution that's almost as fast, but is equally correct. However, a solution that works 1 or 2 orders of magnitude faster, might be.

Explode string only once on first occurring substring

Simply set $limit to 2 for 2 parts of the array. Thanks to @BenJames for mentioning:

preg_split("~\n~", $string, 2);

I tested and it works fine.

The limit argument:

If specified, then only substrings up to limit are returned with the rest of the string being placed in the last substring. A limit of -1, 0 or null means "no limit" and, as is standard across PHP, you can use null to skip to the flags parameter.



Related Topics



Leave a reply



Submit