How to convert UTF8 string to byte array?
The logic of encoding Unicode in UTF-8 is basically:
- Up to 4 bytes per character can be used. The fewest number of bytes possible is used.
- Characters up to U+007F are encoded with a single byte.
- For multibyte sequences, the number of leading 1 bits in the first byte gives the number of bytes for the character. The rest of the bits of the first byte can be used to encode bits of the character.
- The continuation bytes begin with 10, and the other 6 bits encode bits of the character.
Here's a function I wrote a while back for encoding a JavaScript UTF-16 string in UTF-8:
function toUTF8Array(str) {
var utf8 = [];
for (var i=0; i < str.length; i++) {
var charcode = str.charCodeAt(i);
if (charcode < 0x80) utf8.push(charcode);
else if (charcode < 0x800) {
utf8.push(0xc0 | (charcode >> 6),
0x80 | (charcode & 0x3f));
}
else if (charcode < 0xd800 || charcode >= 0xe000) {
utf8.push(0xe0 | (charcode >> 12),
0x80 | ((charcode>>6) & 0x3f),
0x80 | (charcode & 0x3f));
}
// surrogate pair
else {
i++;
// UTF-16 encodes 0x10000-0x10FFFF by
// subtracting 0x10000 and splitting the
// 20 bits of 0x0-0xFFFFF into two halves
charcode = 0x10000 + (((charcode & 0x3ff)<<10)
| (str.charCodeAt(i) & 0x3ff));
utf8.push(0xf0 | (charcode >>18),
0x80 | ((charcode>>12) & 0x3f),
0x80 | ((charcode>>6) & 0x3f),
0x80 | (charcode & 0x3f));
}
}
return utf8;
}
How to convert Strings to and from UTF8 byte arrays in Java
Convert from String
to byte[]
:
String s = "some text here";
byte[] b = s.getBytes(StandardCharsets.UTF_8);
Convert from byte[]
to String
:
byte[] b = {(byte) 99, (byte)97, (byte)116};
String s = new String(b, StandardCharsets.US_ASCII);
You should, of course, use the correct encoding name. My examples used US-ASCII and UTF-8, two commonly-used encodings.
How to convert utf8 string to []byte?
This question is a possible duplicate of How to assign string to bytes array, but still answering it as there is a better, alternative solution:
Converting from string
to []byte
is allowed by the spec, using a simple conversion:
Conversions to and from a string type
[...]
- Converting a value of a string type to a slice of bytes type yields a slice whose successive elements are the bytes of the string.
So you can simply do:
s := "some text"
b := []byte(s) // b is of type []byte
However, the string => []byte
conversion makes a copy of the string content (it has to, as string
s are immutable while []byte
values are not), and in case of large string
s it's not efficient. Instead, you can create an io.Reader
using strings.NewReader()
which will read from the passed string
without making a copy of it. And you can pass this io.Reader
to json.NewDecoder()
and unmarshal using the Decoder.Decode()
method:
s := `{"somekey":"somevalue"}`
var result interface{}
err := json.NewDecoder(strings.NewReader(s)).Decode(&result)
fmt.Println(result, err)
Output (try it on the Go Playground):
map[somekey:somevalue] <nil>
Note: calling strings.NewReader()
and json.NewDecoder()
does have some overhead, so if you're working with small JSON texts, you can safely convert it to []byte
and use json.Unmarshal()
, it won't be slower:
s := `{"somekey":"somevalue"}`
var result interface{}
err := json.Unmarshal([]byte(s), &result)
fmt.Println(result, err)
Output is the same. Try this on the Go Playground.
Note: if you're getting your JSON input string
by reading some io.Reader
(e.g. a file or a network connection), you can directly pass that io.Reader
to json.NewDecoder()
, without having to read the content from it first.
How to convert UTF-8 byte[] to string
string result = System.Text.Encoding.UTF8.GetString(byteArray);
How to convert utf8 string to utf8 byte array?
Can use other option again:
string value = "\u00C4 \uD802\u0033 \u00AE";
byte[] bytes= System.Text.Encoding.UTF8.GetBytes(value);
For more information can look on Encoding.UTF8 Property
How can i encode a string to UTF-8 to a pre existing byte array?
GetBytes has another overload that writes to existing array:
byte[] bytes = new byte[1000]; // sample, make sure it has enough space
var specificIndex = 0;
var actualByteCount = Encoding.UTF8.GetBytes(
myString, 0, myString.Length, bytes, specificIndex);
Don't forget to handle result to know how many bytes in the array actually represent string (actualByteCount
)
Note you may need to use GetByteCount
to get correct array size or adjust number of characters to convert to fit into your buffer.
Java: convert UTF8 String to byte array in another encoding
There is no such thing as an "UTF8 encoded String" in Java. Java Strings use UTF-16 internally, but should be seen as an abstraction without a specific encoding. If you have a String, it's already decoded. If you want to encode it, use string.getBytes(encoding)
. If you original data is UTF-8, you have to take that into account when you convert that data from bytes to String.
String to byte array in UTF-8?
A function like this will do what you need:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s));
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
end;
You can call it with any type of string and the RTL will convert from the encoding of the string that is passed to UTF-8. So don't be tricked into thinking you must convert to UTF-8 before calling, just pass in any string and let the RTL do the work.
After that it's a fairly standard array copy. Note the assertion that explicitly calls out the assumption on string element size for a UTF-8 encoded string.
If you want to get the zero-terminator you would write it so:
function UTF8Bytes(const s: UTF8String): TBytes;
begin
Assert(StringElementSize(s)=1);
SetLength(Result, Length(s)+1);
if Length(Result)>0 then
Move(s[1], Result[0], Length(s));
Result[high(Result)] := 0;
end;
C# - Convert UTF8 String into bits, modify bits, and convert back into UTF8 String
If you're trying to read a file into a byte[] array, modify those bytes, and convert that back to a string, you could do something like this:
// read the file into a byte array
var bytes = File.ReadAllBytes(inputFileName);
// modify the bytes
// now convert back to a UTF string
var stringFromByteArray = Encoding.UTF8.GetString(bytes, 0, bytes.Length);
Related Topics
Create a Newline for Every X Number of Characters
How to Detect Overflow of React Component Without Reactdom
React Open Modal Window on Click in Another Component
Disabling a Button Within a Stateless Component - React
How to Add Spaces to for Loop Output in JavaScript
Javascript Add Class Active When Click a Link After Load Another Page
Barcode Scanner for Mobile Phone for Website in Form
Invalid Attempt to Spread Non-Iterable Instance
How to Implement Ping/Pong Request for Websocket Connection Alive in JavaScript
Multiply and Sum Input Values Created Dynamically in Jquery
How to Increment a Number After Every 1 Second Using JavaScript
Generate a Weighted Random Number
Google Charts - Labels Are Not Showing
Remove a Specific String from Url and Redirect
Ant Design Range Picker Disable Array of Dates
How to Override the Onbeforeunload Dialog and Replace It With My Own
How to Remove All Li Element of Same Class Except First Li Element in Jquery or JavaScript