Urlencode VS Rawurlencode

urlencode vs rawurlencode?

It will depend on your purpose. If interoperability with other systems is important then it seems rawurlencode is the way to go. The one exception is legacy systems which expect the query string to follow form-encoding style of spaces encoded as + instead of %20 (in which case you need urlencode).

rawurlencode follows RFC 1738 prior to PHP 5.3.0 and RFC 3986 afterwards (see http://us2.php.net/manual/en/function.rawurlencode.php)

Returns a string in which all non-alphanumeric characters except -_.~ have been replaced with a percent (%) sign followed by two hex digits. This is the encoding described in » RFC 3986 for protecting literal characters from being interpreted as special URL delimiters, and for protecting URLs from being mangled by transmission media with character conversions (like some email systems).

Note on RFC 3986 vs 1738. rawurlencode prior to php 5.3 encoded the tilde character (~) according to RFC 1738. As of PHP 5.3, however, rawurlencode follows RFC 3986 which does not require encoding tilde characters.

urlencode encodes spaces as plus signs (not as %20 as done in rawurlencode)(see http://us2.php.net/manual/en/function.urlencode.php)

Returns a string in which all non-alphanumeric characters except -_. have been replaced with a percent (%) sign followed by two hex digits and spaces encoded as plus (+) signs. It is encoded the same way that the posted data from a WWW form is encoded, that is the same way as in application/x-www-form-urlencoded media type. This differs from the » RFC 3986 encoding (see rawurlencode()) in that for historical reasons, spaces are encoded as plus (+) signs.

This corresponds to the definition for application/x-www-form-urlencoded in RFC 1866.

Additional Reading:

You may also want to see the discussion at http://bytes.com/groups/php/5624-urlencode-vs-rawurlencode.

Also, RFC 2396 is worth a look. RFC 2396 defines valid URI syntax. The main part we're interested in is from 3.4 Query Component:

Within a query component, the characters ";", "/", "?", ":", "@",

"&", "=", "+", ",", and "$"
are reserved.

As you can see, the + is a reserved character in the query string and thus would need to be encoded as per RFC 3986 (as in rawurlencode).

What is the difference between urlencode and rawurlencode?

It depends on what you are after. A main difference between them is the standard that they encode to of course, but also spaces.

urlencode encodes the same way that form data is encoded

urlencode encodes spaces as + symbols while rawurlencode encodes them as %20.

Therefore when dealing with form data, urlencode would be preferable (as forms encode spaces as + signs too). Otherwise rawurlencode is a wiser choice in my opinion.

For example, you may want to mimic form data being submitted via a URL, you would use urlencode.

urlencode/rawurlencode and automatic decoding

The + character is encoded by both function as %2B, so no confusion is possible.

To safely decode any version, PHP only has to transform each %XX into its corresponding character and transform each + to a space. This is what urldecode does.

rawurlencode shouldn't cause issues as all it does is encode a wider range of chars into their %XX counterparts. Those will be decoded safely by any version of the function.

Substituting whitespaces with %20 in PHP. urlencode and rawurlencode does not work

rawurlencode() is what you're looking for. However, if your Content-Type is set to text/html (which is the default), then you will see the space character instead of the encoded entity.

header('Content-Type: text/plain');
$str = "my string";
echo rawurlencode($str); // => my%20string

Note: I'm not suggesting that you should change the Content-Type header in your original script. It's just to show that your rawurlencode() call is working and to explain why you're not seeing it.

PHP - auto detect (raw)urlencode

The two functions take any character defined by the regular expression [^0-9A-Za-z_~-] and convert it to a percent sign followed by its hexadecimal codepoint. The only difference between the two encoding methods is rawurlencode() uses a %20 for a space, instead of the + used by urlencode().

For decoding, this means that any sequence that matches the regular expression %[0-9A-F]{2} will be properly decoded by either function. That only leaves a + to worry about, which will not get decoded properly by rawurldecode(). So, you can use urldecode() on the server side and not worry about any testing.

<?php
$str = "foo bar baz";
$raw = rawurlencode($str);
$enc = urlencode($str);

echo rawurldecode($raw);
echo rawurldecode($enc);
echo urldecode($raw);
echo urldecode($enc);
?>

Output:

foo bar baz
foo+bar+baz
foo bar baz
foo bar baz

Urlencode everything but slashes?

  1. Split by /
  2. urlencode() each part
  3. Join with /

rawurlencode() and urlencode() not working in CodeIgniter

I know this is an old question. But I was dealing with the same issue. What I have done is:

Encode

<?php echo urlencode(base64_encode('http://kchason.com')); ?>

Decode

<?php echo urldecode(base64_decode('http://kchason.com')); ?>

You use base64_encode to get rid of any URL parts that will cause problems with Codeigniter, and then you use urlencode to encode any = that base64_encode adds to the end of its output.



Related Topics



Leave a reply



Submit