How to Write a File in Utf-8 Format

Write to UTF-8 file in Python

I believe the problem is that codecs.BOM_UTF8 is a byte string, not a Unicode string. I suspect the file handler is trying to guess what you really mean based on "I'm meant to be writing Unicode as UTF-8-encoded text, but you've given me a byte string!"

Try writing the Unicode string for the byte order mark (i.e. Unicode U+FEFF) directly, so that the file just encodes that as UTF-8:

import codecs

file = codecs.open("lol", "w", "utf-8")
file.write(u'\ufeff')
file.close()

(That seems to give the right answer - a file with bytes EF BB BF.)

EDIT: S. Lott's suggestion of using "utf-8-sig" as the encoding is a better one than explicitly writing the BOM yourself, but I'll leave this answer here as it explains what was going wrong before.

How can I write a file in UTF-8 format?

file_get_contents() and file_put_contents() will not magically convert encoding.

You have to convert the string explicitly; for example with iconv() or mb_convert_encoding().

Try this:

$data = file_get_contents($npath);
$data = mb_convert_encoding($data, 'UTF-8', 'OLD-ENCODING');
file_put_contents('tempfolder/' . $a, $data);

Or alternatively, with PHP's stream filters:

$fd = fopen($file, 'r');
stream_filter_append($fd, 'convert.iconv.UTF-8/OLD-ENCODING');
stream_copy_to_stream($fd, fopen($output, 'w'));

Write file in UTF-8 mode using Perl

You want

use utf8;                       # Source code is encoded using UTF-8.

open(my $FH, ">:encoding(utf-8)", "test11.txt")
or die $!;

print $FH "something Çirçös";

or

use utf8;                       # Source code is encoded using UTF-8.
use open ':encoding(utf-8)'; # Sets the default encoding for handles opened in scope.

open(my $FH, ">", "test11.txt")
or die $!;

print $FH "something Çirçös";

Notes:

  • The encoding you want is utf-8 (case-insensitive), not utf8 (a Perl-specific encoding).
  • Don't use global vars; use lexical (my) vars.
  • If you leave off the instruction to encode, you might get lucky and get the right output (along with a "wide character" warning). Don't count on this. You won't always be lucky.

    # Unlucky.
    $ perl -we'use utf8; print "é"' | od -t x1
    0000000 e9
    0000001

    # Lucky.
    $ perl -we'use utf8; print "é♡"' | od -t x1
    Wide character in print at -e line 1.
    0000000 c3 a9 e2 99 a1
    0000005

    # Correct.
    $ perl -we'use utf8; binmode STDOUT, ":encoding(utf-8)"; print "é♡"' | od -t x1
    0000000 c3 a9 e2 99 a1
    0000005

Write a file with encoding UTF-8 in php

If your 3rd party program "do not support files in ANSI but UTF-8" as you mentioned in a comment then most likely it's expecting a BOM.

While the Unicode Standard does allow a BOM in UTF-8,[2] it does not
require or recommend it.[3] Byte order has no meaning in UTF-8[4] so a
BOM serves only to identify a text stream or file as UTF-8.

The reason
the BOM is recommended against is that it defeats the ASCII
back-compatibility that is part of UTF-8's design.

So strictly speaking your 3rd party program isn't completely compliant with the standard because the BOM should be optional. ANSI is 100% valid UTF-8 and that is one of the main drivers of it. Anything that can understand UTF-8 accordng to the standard by definition also understands ANSI.

Try writing "\xEF\xBB\xBF" to the front of the file and see if that solves your problem.

How can I create a file with utf-8 in Python?

An empty file is always binary.

$ touch /tmp/foo
$ file -i /tmp/foo
/tmp/foo: inode/x-empty; charset=binary

Put something in it and everything is fine.

$ cat > /tmp/foo 
Rübe
Möhre
Mähne
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

Python will do the same as cat.

with open("/tmp/foo", "w") as f:
f.write("Rübe\n")

Check it:

$ cat /tmp/foo
Rübe
$ file -i /tmp/foo
/tmp/foo: text/plain; charset=utf-8

Edit:

Using Python 2.7, you must encode an Unicode string.

with open("/tmp/foo", "w") as f:
f.write(u"Rübe\n".encode("UTF-8"))

How to write file as UTF-8 with AppleScript

The people are right, just add as «class utf8»

write this_data to the open_target_file starting at eof as «class utf8»

Of course you have to append as «class utf8» as well when you read the file and the file is UTF8 encoded.

And you can delete these two redundant lines

set the target_file to the target_file as string
set appleScriptfilePath to target_file as string


Related Topics



Leave a reply



Submit