Fwrite() and Utf8

fwrite() and UTF8

The only thing I had to do is add a UTF8 BOM to the CSV, the data was correct but the file reader (external application) couldn't read the file properly without the BOM

How can i specify encode in fwrite() for export csv file R?

You should post a reproducible example, but I would guess you could do this by making sure the data in DT is in UTF-8 within R, then setting the encoding of each column to "unknown". R will then assume the data is encoded in the native encoding when you write it out.

For example,

DF <- data.frame(text = "á", stringsAsFactors = FALSE)
DF$text <- enc2utf8(DF$text) # Only necessary if Encoding(DF$text) isn't "UTF-8"
Encoding(DF$text) <- "unknown"
data.table::fwrite(DF, "DF.csv", bom = TRUE)

If the columns of DF are factors, you'll need to convert them to character vectors before this will work.

Write to UTF-8 file in Python

I believe the problem is that codecs.BOM_UTF8 is a byte string, not a Unicode string. I suspect the file handler is trying to guess what you really mean based on "I'm meant to be writing Unicode as UTF-8-encoded text, but you've given me a byte string!"

Try writing the Unicode string for the byte order mark (i.e. Unicode U+FEFF) directly, so that the file just encodes that as UTF-8:

import codecs

file = codecs.open("lol", "w", "utf-8")
file.write(u'\ufeff')
file.close()

(That seems to give the right answer - a file with bytes EF BB BF.)

EDIT: S. Lott's suggestion of using "utf-8-sig" as the encoding is a better one than explicitly writing the BOM yourself, but I'll leave this answer here as it explains what was going wrong before.

UTF-8 characters in fwrite

Try to add a BOM (Byte Order Mark) to your file :

$output_string = "\xEF\xBB\xBF";
$output_string .= "string with characters like ã or ì";
$fileHandle = // ...

php fwrite does not write öäü / utf8

On your the desktop:

Try to use a program which correctly handles UTF-8 encoding when opening the file. UTF-8 without BOM and ASCII is the same on the lowest code points and some programs are determining the encoding based on a sample which not necessary contains any characters from the higher code points. (Note: Windows' notepad.exe is not the best choice to check the file)

The first 128 characters of Unicode, which correspond one-to-one with ASCII, are encoded using a single octet with the same binary value as ASCII, making valid ASCII text valid UTF-8-encoded Unicode as well. (http://en.wikipedia.org/wiki/UTF-8)

Another way is to explicitly set the encoding to UTF-8 in the program and check the file with that settings.

Based on your last sentence (My Firefox browser shows the öäü correctly, but I cannot get a valid RSS 2.0 feed, and so the feed entries don't show.) the encoding is fine, just your program and the server's headers are incorrect.

On the server side:

After you confirmed that the file is in UTF-8 by opening it a program which correctly handles UTF-8 without BOM, you have to check your webserver's configuration (or at least the configuration of your subdomain).

You have to set the encoding for *.xml (or the specific xml) in the headers. If you are using pregenerated files, you have to do this in the domain's or in the server's config.

W3C's Setting charset information in .htaccess article could help.

Specifying by extension

Use the AddCharset directive to associate the character encoding with
all files having a particular extension in the current directory and
its subdirectories. For example, to serve all files with the extension
.html as UTF-8, open the .htaccess file in a plain text editor and
type the following line:

AddCharset UTF-8 .html

The extension can be specified with or without
a leading dot. You can add multiple extensions to the same line. This
will still work if you have file names such as example.en.html or
example.html.en.

The example will cause all files with the extension .html to be served
as UTF-8. The HTTP Content-Type header will contain a line that ends
with the 'charset' information as shown in the example that follows.

Content-Type: text/html; charset=UTF-8

Note: All files with this
extension in all subdirectories of the current location will also be
served as UTF-8. If, for some reason, you need to serve the odd file
with a different encoding you will need to override this using
additional directives.

Note: You can associate the character encoding with any extension
attached to your file. For example, suppose you do language
negotiation and you have pages in two languages that follow the model
example.en.html and example.ja.html. Let's also suppose that you are
happy to serve English pages using your server's ISO-8859-1 default,
but want to serve Japanese files in UTF-8.

Summarizing the comments

If you are using output escaping (htmlentities, htmlspecialchars, strip_tags, etc), please check that these functions are not interfering or called multiple times.

Using htmlentities() multiple time could lead to undesired results:

htmlentities('Ö') = Ö (Ö in the browser)
htmlentities(htmlentities('Ö')) = &Ouml; (Ö in the browser)

PHP fwrite function to write txt file in utf-8 encoding

If the text displays fine in one program but not another, that just means one program interprets the file correctly while the other doesn't. Most likely Notepad sets a UTF-8 BOM on the file when you save it again, so Eclipse now automatically recognizes that it's UTF-8 encoded. Without that, Eclipse assumes latin-1 or some other encoding as the default.

Two options:

  • change your Eclipse preferences to open files as UTF-8 by default
  • set a BOM on the file when writing it, see Encoding a string as UTF-8 with BOM in PHP

A BOM can be helpful for making programs recognize UTF-8 but can also cause problems in other programs that don't expect or want BOMs. Whether to use a BOM or not depends on your intended use and target audience.

php 5.6.7 fwrite vs fputs utf-8 encoding

This is the only solution that I could come up with. When running the below code into a file, using getallheaders, the fputs of the $_REQUEST['data'] worked. I assume the header of the file is set to the BOM of utf-8 and therefore everything that is entered after it is properly encoded.

 $file = "datafile_fputs";
$datafile_fputs = fopen($file, "a");
foreach (getallheaders() as $name => $value) {
$y = "\xEF\xBB\xBF";
$y .= "$name: $value\n";
fputs($datafile_fputs,$y);
}
fputs($datafile_fputs,$_REQUEST['data']);
fclose($datafile_fputs);


Related Topics



Leave a reply



Submit