Is There Any Benefit to Adding Accept-Charset="Utf-8" to HTML Forms, If the Page Is Already in Utf-8

Is there any benefit to adding accept-charset= UTF-8 to HTML forms, if the page is already in UTF-8?

If the page is already interpreted by the browser as being UTF-8, setting accept-charset="utf-8" does nothing.

If you set the encoding of the page to UTF-8 in a <meta> and/or HTTP header, it will be interpreted as UTF-8, unless the user deliberately goes to the View->Encoding menu and selects a different encoding, overriding the one you specified.

In that case, accept-encoding would have the effect of setting the submission encoding back to UTF-8 in the face of the user messing about with the page encoding. However, this still won't work in IE, due the previous problems discussed with accept-encoding in that browser.

So it's IMO doubtful whether it's worth including accept-charset to fix the case where a non-IE user has deliberately sabotaged the page encoding (possibly messing up more on your page than just the form).

Personally, I don't bother.

accept-charset= UTF-8 parameter doesnt do anything, when used in form

The question, as asked, is self-contradictory: the heading says that the accept-charset parameter does not do anything, whereas the question body says that when the accept-charset attribute (this is the correct term) is used, “the headers have different accept charset option in the request header”. I suppose a negation is missing from the latter statement.

Browsers send Accept-Charset parameters in HTTP request headers according to their own principles and settings. For example, my Chrome sends Accept-Charset:windows-1252,utf-8;q=0.7,*;q=0.3. Such a header is typically ignored by server-side software, but it could be used (and it was designed to be used) to determine which encoding is to be used in the server response, in case the server-side software (a form handler, in this case) is capable of using different encodings in the response.

The accept-charset attribute in a form element is not expected to affect HTTP request headers, and it does not. It is meant to specify the character encoding to be used for the form data in the request, and this is what it actually does. The HTML 4.01 spec is obscure about this, but the W3C HTML5 draft puts it much better, though for some odd reason uses plural: “gives the character encodings that are to be used for the submission”. I suppose the reason is that you could specify alternate encodings, to prepare for situations where a browser is unable to use your preferred encoding. And what actually happens in Chrome for example is that if you use accept-charset="foobar utt-8", then UTF-8 used.

In practice, the attribute is used to make the encoding of data submission different from the encoding of the page containing the form. Suppose your page is ISO-8859-1 encoded and someone types Greek or Hebrew letters into your form. Browsers will have to do some error recovery, since those characters cannot be represented in ISO-8859-1. (In practice they turn the characters to numeric character references, which is logically all wrong but pragmatically perhaps the best they can do.) Using <form charset=utf-8> helps here: no matter what the encoding is, the form data will be sent as UTF-8 encoding, which can handle any character.

If you wish to tell the form handler which encoding it should use in its response, then you can add a hidden (or non-hidden) field into the form for that.

When should one use accept-charset= UTF-8 in an HTML form?

When the page is already utf-8 it is redundant, but not 100%.
If the user overrides the encoding of the page, then adding that to the form is an extra safety measure.

Is form charset required?

Almost every decent browser ignores the accept-charset attribute in favour of the encoding of the page with the form as it is defined in charset param of the Content-Type response header. The attribute works as far only in MSIE and even then, it is using it wrong. In MSIE running on Windows, any other value than UTF-8 would be interpreted as CP-1252.

Don't use this attribute. It's useless.

How to pass a HTTP parameter from a CP1251 page to a UTF-8 handler?

I can ether convert the value of param() from CP1251 to UTF-8 or add accept-charset='utf-8' attribute to <form> element.

What is meta http-equiv= Content-Type content= text/html; charset=utf-8 / ?

According to HTML Dog:

The charset attribute can be used as a shorthand method to define an HTML document's character set, which is always a good thing to do. <meta charset="utf-8"> is the same as <meta http-equiv="content-type" content="text/html; charset=utf-8">.

So it's basically used to define the charset of your HTML document.

The reason why Visual Studio 2017 adds both the meta tags may be because this way your HTML will be maximum compatible with older browsers.

<meta http-equiv="content-type" content="text/html; charset=utf-8"> is the old way to define the charset.

<meta charset="utf-8"> is the new and shorter way to do the same thing.

Changing encoding and charset to UTF-8

is UTF-8 backwards compatible with ISO-8859-1?

Unicode is a superset of the code points contained in ISO-8859-1 so all the "characters" can be represented in UTF-8 but how they map to byte values is different. There is overlap between the encoded values but it is not 100%.

In terms of serving content or processing forms submissions you are unlikely to have many issues.

It may mean a breaking change for URL handling. For example, for a parameter value naïve there would be two incompatible forms:

  • http://example.com/foo?p=na%EFve
  • http://example.com/foo?p=na%C3%AFve

This is only likely to be an issue if there are external applications relying on the old form.

Display chinese characters WITHOUT using utf8 encoding?

I'm not sure that you can. iso-8859-1 is commonly called "Latin 1". There's no support for any Asian kanji-type languages at all.

http://en.wikipedia.org/wiki/ISO/IEC_8859-1

ISO 8859-1 encodes what it refers to as "Latin alphabet no. 1," consisting of 191 characters from the Latin script. This character-encoding scheme is used throughout the Americas, Western Europe, Oceania, and much of Africa. It is also commonly used in most standard romanizations of East-Asian languages.



Related Topics



Leave a reply



Submit