Encode String to UTF-8
String
objects in Java use the UTF-16 encoding that can't be modified*.
The only thing that can have a different encoding is a byte[]
. So if you need UTF-8 data, then you need a byte[]
. If you have a String
that contains unexpected data, then the problem is at some earlier place that incorrectly converted some binary data to a String
(i.e. it was using the wrong encoding).
* As a matter of implementation, String
can internally use a ISO-8859-1 encoded byte[]
when the range of characters fits it, but that is an implementation-specific optimization that isn't visible to users of String
(i.e. you'll never notice unless you dig into the source code or use reflection to dig into a String
object).
Encode a String to UTF-8 in Perl
You need utf::encode, not decode. Both of them change the argument in place and return nothing, so there's no point in assigning the return value to the variable.
#!/usr/bin/perl
use strict;
use warnings;
use utf8;
use JSON;
my $data = qq({"cat":"Büster"});
utf8::encode($data);
$data = JSON::decode_json($data);
binmode *STDOUT, ':encoding(UTF-8)';
print $data->{cat};
Morover, the output filehandle needs to know what encoding it should use, that's what the binmode does.
Also, make sure you save the source in the UTF-8 encoding.
String encoding (UTF-8) JAVA
According to the javadoc of String#getBytes(String charsetName)
:
Encodes this String into a sequence of bytes using the named charset,
storing the result into a new byte array.
And the documentation of String(byte[] bytes, Charset charset)
Constructs a new String by decoding the specified array of bytes using
the specified charset.
Thus getBytes()
is opposite operation of String(byte [])
. The getBytes()
encodes the string to bytes, and String(byte [])
will decode the byte array and convert it to string. You will have to use same charset for both methods to preserve the actual string value. I.e. your second example is wrong:
// This is wrong because you are calling getBytes() with default charset
// But converting those bytes to string using UTF-8 encoding. This will
// mostly work because default encoding is usually UTF-8, but it can fail
// so it is wrong.
new String(string1.getBytes(),"UTF-8"));
encoding string to utf-8 leaves non english characters as byte strings
print(tweet.content.encode('utf-8'))
writes a byte string (data, not text) in a human-readable, ASCII-compatible form (leading b
to represent a byte string, non-ASCII byte values >127 represented as hexadecimal escape codes \xNN
) and is not what you want.
If using output redirection, Python can be told what encoding to use to convert text to a byte stream suitable for a file using an environment variable:
set PYTHONIOENCODING=utf8
python main.py > output.txt
You can also write the data directly to a file specifying the encoding instead of using redirection:
with open('tweet.txt','w',encoding='utf8') as f:
f.write(tweet.content)
Decoding string to UTF-8 if string is already defined as r'string' instead of b'string'
Try this:
import re
re.sub(rb'\\([0-7]{3})', lambda match: bytes([int(match[1], 8)]), stringa.encode('ascii')).decode('utf-8')
Explanation:
We use the regular expression rb'\\([0-7]{3})'
(which matches a literal backslash \
followed by exactly 3 octal digits) and replace each occurrence by taking the three digit code (match[1]
), interpreting that as a number written in octal (int(_, 8)
), and then replacing the original escape sequence with a single byte (bytes([_])
).
We need to operate over bytes because the escape codes are of raw bytes, not unicode characters. Only after we "unescaped" those sequences, can we decode the UTF-8 to a string.
Encode a string in UTF-8
Try this instead:
$enc = [System.Text.Encoding]::UTF8
$consumerkey ="xvz1evFS4wEEPTGEFPHBog"
$encconsumerkey= $enc.GetBytes($consumerkey)
Encode String to UTF-8 in Kotlin
Kotlin has an overload of ByteArray.toString
accepting a Charset
. All you need to do is use it: array.toString(charset)
.
I cannot find a section in the documentation specifying that ByteArray.toString()
does the right thing, as it doesn't in Java and that behavior probably is preserved in Kotlin. I would guess it does the wrong thing. I recommend using toString(charset)
explicitly.
Preventing Python requests.post to encode strings to UTF-8
If for some reason the target API is not fully JSON-compliant, you can build a JSON response manually and encode it in whatever encoding you like. ensure_ascii=False
wil disable non-ASCII translation to escape codes, and you can specify the encoding if it is non-standard. The wireshark screenshot shows the data is actually UTF-8-encoded, so that is what I've done below:
import requests
import json
payload = {
"type": "send-message",
"username": "myuser",
"password": "mypass",
"to": "456",
"msg": "here are accents: é ç"
}
headers = {'Content-Type': 'application/json'}
data = json.dumps(payload, ensure_ascii=False).encode('utf8')
resp = requests.post("http://192.168.1.10/send_message.html", data=data, headers=headers)
Related Topics
How to Open a Command Terminal in Linux
Understanding Metaspace Line in Jvm Heap Printout
What's the Location of the Javafx Runtime Jar File, Jfxrt.Jar, on Linux
Tool for Creating a Java Daemon Service on Linux
Java Is Installed, in Listing, But Execution Produces "./Java: No Such File or Directory"
Java Jsch Changing User on Remote MAChine and Execute Command
Tomcat 7: How to Set Initial Heap Size Correctly
Difference Between Using Java.Library.Path and Ld_Library_Path
How to Get the Ip of the Computer on Linux Through Java
How to Kill a Linux Process in Java with Sigkill Process.Destroy() Does Sigterm
Cannot Load R Xlsx Package on MAC Os 10.11
How to Reproduce a Silently Dropped Tcp/Ip Connection
What Are the Main Benefits of Using Mono Over Java
Find Java_Home and Set It on Rhel
Why How to Successfully Move a File in Linux While It Is Being Written To
Manifest Merger Failed:Attribute Application@Appcomponentfactory Cant Solve This