Convert UTF8 to UTF16 using iconv
UTF-16LE
tells iconv
to generate little-endian UTF-16 without a BOM (Byte Order Mark). Apparently it assumes that since you specified LE
, the BOM isn't necessary.
UTF-16
tells it to generate UTF-16 text (in the local machine's byte order) with a BOM.
If you're on a little-endian machine, I don't see a way to tell iconv
to generate big-endian UTF-16 with a BOM, but I might just be missing something.
I find that the file
command doesn't recognize UTF-16 text without a BOM, and your editor might not either. But if you run iconv -f UTF-16LE -t UTF_8 b-16 strings
, you should get a valid UTF-8 version of the original file.
Try running od -c
on the files to see their actual contents.
UPDATE :
It looks like you're on a big-endian machine (x86 is little-endian), and you're trying to generate a little-endian UTF-16 file with a BOM. Is that correct? As far as I can tell, iconv
won't do that directly. But this should work:
( printf "\xff\xfe" ; iconv -f utf-8 -t utf-16le UTF-8-FILE ) > UTF-16-FILE
The behavior of the printf
might depend on your locale settings; I have LANG=en_US.UTF-8
.
(Can anyone suggest a more elegant solution?)
Another workaround, if you know the endianness of the output produced by -t utf-16
:
iconv -f utf-8 -t utf-16 UTF-8-FILE | dd conv=swab 2>/dev/null
UTF-8 to UTF-16, different results using iconv vs mbstring
iconv
adds a BOM at the begging of the output string. So for converting string, you probably want to use mb_convert_encoding
. iconv
can be more useful for files.
Convert UTF-8 to UTF-16 in iconv
It's not UTF-8
, but UCS-2
Try:-
cat test | iconv -f UCS-2 -t UTF-16
Converting UTF-16 to UTF-8 using libiconv
The input data for iconv
is always an opaque byte stream. When reading UTF-16, iconv
expects the input data to consist of two-byte code units. Therefore, if you want to provide hard-coded input data, you need to use a two-byte wide integral type.
In C++11 and C11 this should be char16_t
, but you can also use uint16_t
:
uint16_t data[] = { 0x68, 0x69, 0 };
char const * p = (char const *)data;
To be pedantic, there's nothing in general that says that uint16_t
has two bytes. However, iconv
is a Posix library, and Posix mandates that CHAR_BIT == 8
, so it is true on Posix.
(Also note that the way you spell a literal value has nothing to do with the width of the type which you initialize with that value, so there's no difference between 0x68
, 0x0068
, or 0x00068
. What's much more interesting are the new Unicode character literals \u
and \U
, but that's a whole different story.)
how to convert utf-8 to utf-16 with ndash?
I think the problem is that your code isn't outputting a UTF-16LE BOM (byte order mark) at the beginning of the file, so the programs reading it don't know what encoding it's in and are (apparently) guessing poorly.
A UTF-16LE BOM is the byte sequence 0xFF
0xFE
(in that order) right at the beginning of the file. Make that the first thing you write to your output. More about BOMs in this Unicode FAQ.
To test my theory, I wrote the byte sequence for a UTF-16LE file containing only the characters 0–0
:
FF FE 30 00 13 20 30 00
The FF FE
is the BOM, the 30 00
is the digit zero, the 13 20
is the N-dash, and the final 30 00
is the final digit zero. (The zeros are just there so I can easily find the dash, though in such a short file it wouldn't really be difficult.)
I was able to open that with Office 365 on Windows just fine.
Then I wrote a file without the BOM:
30 00 13 20 30 00
Office 365 did indeed misinterpret the N-dash and show it as a character that looks like a pair of brackets.
Converting file using `iconv` from UTF-16LE to UTF-8 yields UTF-16LE file
This question has been online for a long time and received literally no views nor an answer. Here's how I finally solved the problem.
I made a script for nodejs which performs the conversion:
const fs = require('fs');
const schemaFileName = 'data/schema.graphql';
const readContent = fs.readFileSync(schemaFileName, {
encoding: 'utf16le',
});
const writeContent = (readContent.charAt(0) === '\ufeff')
? readContent.substring(1)
: readContent;
fs.writeFileSync(schemaFileName, writeContent, 'utf8');
Related Topics
How to Setup Oracle Odbc Drivers on Rhel 6/Linux
Current Linux Kernel Debugging Techniques
Does Gcc, Icc, or Microsoft's C/C++ Compiler Support or Know Anything About Numa
Bash: Add String to the End of the File Without Line Break
Recursively List All Files in a Directory Including Files in Symlink Directories
How to Create Threads Without System Calls in Linux X86 Gas Assembly
Which Signal Does Ctrl-X Send When Used in a Terminal
Differencebetween Pthread_Self() and Gettid()? Which One Should I Use
Windows Authentication Headers Without .Net. Possible
Arch Linux - Apt-Get Update Equivalent Command
How to Control Backlight by Terminal Command
How to Make Sure the Numpy Blas Libraries Are Available as Dynamically-Loadable Libraries
Ssh: Could Not Resolve Hostname [Hostname]: Nodename Nor Servname Provided, or Not Known
What Is Raw Socket in Socket Programming