A HTML Space Is Showing as %2520 Instead of %20

A html space is showing as %2520 instead of %20

A bit of explaining as to what that %2520 is :

The common space character is encoded as %20 as you noted yourself.
The % character is encoded as %25.

The way you get %2520 is when your url already has a %20 in it, and gets urlencoded again, which transforms the %20 to %2520.

Are you (or any framework you might be using) double encoding characters?

Edit:
Expanding a bit on this, especially for LOCAL links. Assuming you want to link to the resource C:\my path\my file.html:

  • if you provide a local file path only, the browser is expected to encode and protect all characters given (in the above, you should give it with spaces as shown, since % is a valid filename character and as such it will be encoded) when converting to a proper URL (see next point).
  • if you provide a URL with the file:// protocol, you are basically stating that you have taken all precautions and encoded what needs encoding, the rest should be treated as special characters. In the above example, you should thus provide file:///c:/my%20path/my%20file.html. Aside from fixing slashes, clients should not encode characters here.

NOTES:

  • Slash direction - forward slashes / are used in URLs, reverse slashes \ in Windows paths, but most clients will work with both by converting them to the proper forward slash.
  • In addition, there are 3 slashes after the protocol name, since you are silently referring to the current machine instead of a remote host ( the full unabbreviated path would be file://localhost/c:/my%20path/my%file.html ), but again most clients will work without the host part (ie two slashes only) by assuming you mean the local machine and adding the third slash.

URL encoding the space character: + or %20?

From Wikipedia (emphasis and link added):

When data that has been entered into HTML forms is submitted, the form field names and values are encoded and sent to the server in an HTTP request message using method GET or POST, or, historically, via email. The encoding used by default is based on a very early version of the general URI percent-encoding rules, with a number of modifications such as newline normalization and replacing spaces with "+" instead of "%20". The MIME type of data encoded this way is application/x-www-form-urlencoded, and it is currently defined (still in a very outdated manner) in the HTML and XForms specifications.

So, the real percent encoding uses %20 while form data in URLs is in a modified form that uses +. So you're most likely to only see + in URLs in the query string after an ?.

When should space be encoded to plus (+) or %20?

+ means a space only in application/x-www-form-urlencoded content, such as the query part of a URL:

http://www.example.com/path/foo+bar/path?query+name=query+value

In this URL, the parameter name is query name with a space and the value is query value with a space, but the folder name in the path is literally foo+bar, not foo bar.

%20 is a valid way to encode a space in either of these contexts. So if you need to URL-encode a string for inclusion in part of a URL, it is always safe to replace spaces with %20 and pluses with %2B. This is what, e.g., encodeURIComponent() does in JavaScript. Unfortunately it's not what urlencode does in PHP (rawurlencode is safer).

See Also

HTML 4.01 Specification application/x-www-form-urlencoded

jsp unable to replace %20 with space?

There's no reason to replace the %20 with a space in the action attribute; they mean the same thing, but %20 is the normalized form.

I suspect you're seeing this because of the way you're looking at it.

Your replaceAll works (example). But literal spaces in URLs are generally a bad idea (I think with http URLs they're invalid, in fact, but I'd have to check the RFC). %20 is what they're replaced with in URL-encoding. So my suspicion is that although you're successfully replacing the %20 with a space, when you use the form, the browser is showing you the normalized form (with the %20 instead).

a tag with php adds %20 even without spaces

you can try something like this to remove white spaces:

<a class="links" href="buying.php?link=<?php echo preg_replace('/\s+/', '', $urlname) ?>">Gekauft</a>

Use %20 instead of + to encode spaces when submitting a form using GET method

+ is the standard way to encode spaces in application/x-www-form-urlencoded. The backend should be handling them properly.

The only solution I can imagine is to listen onSubmit event, use encodeURIComponent to encode all the inputs by myself and use location.href to redirect to the result URL.

If you cannot get the backend updated to handle them properly, sadly, your only option is indeed that. Of course, any clients with JavaScript disabled will send the form in th usual fashion; to prevent that, you might want to have the form unavailable by default and only made available by JavaScript on the page.

This section of the specification addresses how to serialize the form controls; some notes (but double-check with the spec):

  • You'll need to use encodeURIComponent on field names as well, not just values.
  • disabled fields are not included.
  • Fields without names are not included.
  • Fields are included in document order (the order you'll see them in a querySelectorAll on the form).
  • A checkbox is omitted entirely if not checked; if it's checked, its name and the encoded version of its value (or "on" if it has none) are included.
  • Repeated fields with the same name are just included as repeats, e.g. &field=value1&field=value2.
  • type=button and type=reset buttons are not included.
  • type=submit buttons are only included if that button was used to submit the form. You'll need to watch for the onclick on the button in order to know that you should include the button's value, since forms can be submitted other ways.

Can I replace % 20 with & nbsp in URLs that have spaces?

The short answer is, they are both used to represent "spaces", but they represent different spaces.

%20 is the URL escaping for byte 32, which corresponds to plain old space in pretty much any encoding you're likely to use in a URL.

  is an HTML character reference which actually refers to character 160 of Unicode (and also ISO-8859-1 aka Latin-1). It's a different space character entirely -- the "non-breaking space". Even though they look pretty much the same, they're different characters and it's unlikely that your server will treat them the same way.

The origin on why '%20' is used as a space in URLs

It's called percent encoding. Some characters can't be in a URI (for example #, as it denotes the URL fragment), so they are represented with characters that can be (# becomes %23)

Here's an excerpt from that same article:

When a character from the reserved set (a "reserved character") has
special meaning (a "reserved purpose") in a certain context, and a URI
scheme says that it is necessary to use that character for some other
purpose, then the character must be percent-encoded.
Percent-encoding a reserved character involves converting the
character to its corresponding byte value in ASCII and then
representing that value as a pair of hexadecimal digits.
The digits,
preceded by a percent sign ("%") which is used as an escape character,
are then used in the URI in place of the reserved character. (For a
non-ASCII character, it is typically converted to its byte sequence in
UTF-8, and then each byte value is represented as above.)

The space character's character code is 32:

> ' '.charCodeAt(0)
32

Which is 20 in base-16:

> ' '.charCodeAt(0).toString(16)
"20"

Tack a percent sign in front of it and you get %20.



Related Topics



Leave a reply



Submit