Unicode Characters in Urls

Unicode characters in URLs

Use percent encoding. Modern browsers will take care of display & paste issues and make it human-readable. E. g. http://ko.wikipedia.org/wiki/위키백과:대문

Edit: when you copy such an url in Firefox, the clipboard will hold the percent-encoded form (which is usually a good thing), but if you copy only a part of it, it will remain unencoded.

Encode unicode characters in URL using PHP?

This will urlencode all non-ascii characters in the URL string:

$url = preg_replace_callback('/[^\x20-\x7f]/', function($match) {
return urlencode($match[0]);
}, $url);

UTF-8 characters in URLs

Unicode characters in the url (I'm not talking about the domainname) are safe to use. There is no security risk, if you use them on your site. (There are some risks to the end user if he visits a fraudulent site using unicode on the page as Oded said).

The only real problem is how older browsers (and OSs) show them. Browsers not supporting them will show those ugly percentage encoded chars in the url. You probably also have to percentage-encode the urls inside the html in case older browsers don't encode it for you and the user can't follow the link (which is bad). Modern browsers show the decoded url in the addressbar, but use the encoded version to send the request, so the user always sees the pretty unicode characters.

Url Unicode characters encoding

You can use the HttpUtility.UrlPathEncode method in the System.Web assembly (requires the full .NET Framework 4 profile):

var encoded = HttpUtility.UrlPathEncode("http://www.wikipedia.com/wiki/example");


Related Topics



Leave a reply



Submit