Can the conversion of a String to Data with UTF-8 encoding ever fail?
UTF-8 can represent all valid Unicode code points, therefore a conversion
of a Swift string to UTF-8 data cannot fail.
The forced unwrap in
let string = "some string .."
let data = string.data(using: .utf8)!
is safe.
The same would be true for .utf16
or .utf32
, but not for
encodings which represent only a restricted character set,
such as .ascii
or .isoLatin1
.
You can alternatively use the .utf8
view of a string to create UTF-8 data,
avoiding the forced unwrap:
let string = "some string .."
let data = Data(string.utf8)
What is the fool proof way to convert some string (utf-8 or else) to a simple ASCII string in python
If you want an ASCII string that unambiguously represents what you have got, without losing any information, the answer is simple:
Don't muck about with encode/decode, use the repr()
function (Python 2.X) or the ascii()
function (Python 3.x).
String from NSData fails using UTF8 but succeeds using ASCII
The problem is that not every sequence of bytes is valid if interpreted as UTF-8. For example, a single byte with a value of 0xff = 255 is never valid in UTF-8. On the other hand, it might be that the ASCII encoding allows every byte value, even though this is not really correct.
You better have a good look at the data and see what encoding it actually is. And if it is just random bytes, then please do NOT convert them to a string.
Convert UTF-8 encoded NSData to NSString
If the data is not null-terminated, you should use -initWithData:encoding:
NSString* newStr = [[NSString alloc] initWithData:theData encoding:NSUTF8StringEncoding];
If the data is null-terminated, you should instead use -stringWithUTF8String:
to avoid the extra \0
at the end.
NSString* newStr = [NSString stringWithUTF8String:[theData bytes]];
(Note that if the input is not properly UTF-8-encoded, you will get nil
.)
Swift variant:
let newStr = String(data: data, encoding: .utf8)
// note that `newStr` is a `String?`, not a `String`.
If the data is null-terminated, you could go though the safe way which is remove the that null character, or the unsafe way similar to the Objective-C version above.
// safe way, provided data is \0-terminated
let newStr1 = String(data: data.subdata(in: 0 ..< data.count - 1), encoding: .utf8)
// unsafe way, provided data is \0-terminated
let newStr2 = data.withUnsafeBytes(String.init(utf8String:))
PHP: Convert any string to UTF-8 without knowing the original character set, or at least try
What you're asking for is extremely hard. If possible, getting the user to specify the encoding is the best. Preventing an attack shouldn't be much easier or harder that way.
However, you could try doing this:
iconv(mb_detect_encoding($text, mb_detect_order(), true), "UTF-8", $text);
Setting it to strict might help you get a better result.
Convert string of unknown encoding to UTF-8
"Träume groß"
is a hint that you got something originally encoded as utf-8, but your process read it as cp1252.
A possible way is to encode your string back to cp1252 and then correctly decode it as utf-8:
print('"Träume groß"'.encode('cp1252').decode('utf8'))
gives as expected:
"Träume groß"
But this is only a workaround. The correct solution is to understand where you have read the original bytes as cp1252 and directly use the utf8 conversion there.
Related Topics
What Does the '@' Symbol Mean in Swift
Vertically Aligning Text in an Nstextfield Using Swift
Swift 5.5: Asynchronously Iterating Line-By-Line Through a File
Creating a Countableclosedrange<Character>
How to Integrate Uiactivityviewcontroller with Swiftui's Scrollview
Storing/Passing Function Types from Swift Protocols
Writing Data to an Nsoutputstream in Swift 3
How to Compare "Any" Value Types
Keyboard Overlaying Action Sheet in iOS 13.1 on Cncontactviewcontroller
Swiftier Swift for 'Add to Array, or Create If Not There...'
Get Current Time as String Swift 3.0
Accessing Mkmapview Elements as Uiviewrepresentable in the Main (Contentview) Swiftui View
Swiftui Exporting or Sharing Files
Anonymous Closure Argument Not Contained in a Closure