Convert Array of Unicodescalar into String in Swift

Convert Array of UnicodeScalar into String in Swift

The second case is simpler because array2 is a UnicodeScalarView
and not an array:

let array2 = "bar".unicodeScalars

let str2 = String(array2)
print(str2) // bar

If you have an array (or any sequence) of Unicode scalars then you can start with an empty string
and append the elements to its unicodeScalars view:

let array = [UnicodeScalar("f")!, UnicodeScalar("o")!, UnicodeScalar("o")!]
// Or: let array: [UnicodeScalar] = ["f", "o", "o"]

var str1 = ""
str1.unicodeScalars.append(contentsOf: array)
print(str1) // foo

Of course you can define a custom extension for that purpose:

extension String {
init<S: Sequence>(unicodeScalars ucs: S)
where S.Iterator.Element == UnicodeScalar
{
var s = ""
s.unicodeScalars.append(contentsOf: ucs)
self = s
}
}

let str1 = String(unicodeScalars: array)

Convert array of unicode code points to string

You can convert each of your hexa string to UInt32, initialise an Unicode.Scalar for each element and create a String UnicodeScalarView from it:

let arr = ["0023", "FE0F", "20E3"]
let values = arr.compactMap{ UInt32($0, radix: 16) }
let unicodeScalars = values.compactMap(Unicode.Scalar.init)
let string = String(String.UnicodeScalarView(unicodeScalars))

Which can be also be written as a one liner:

let arr = ["0023", "FE0F", "20E3"]
let string = String(String.UnicodeScalarView(arr.compactMap{ UInt32($0, radix: 16) }.compactMap(Unicode.Scalar.init)))

edit/update:

If all your strings can be represented by UInt16 values you can also use String initializer init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)as shown by @MartinR here:

let arr = ["0023", "FE0F", "20E3"]
let values = arr.compactMap { UInt16($0, radix: 16) }
let string = String(utf16CodeUnits: values, count: values.count) // "#️⃣"

Convert UnicodeScalar index to String.Index

The various String views share a common index. If you have a position given as an offset into the UnicodeScalar view then use String.unicodeScalars.index() to convert it to a String.Index. Example:

let s = "br>print(Array(s.unicodeScalars))
// ["\u{0001F1E6}", "\u{0001F1F9}", "\u{0001F1E7}", "\u{0001F1EA}"]

let ucOffset = 2
let sIndex = s.unicodeScalars.index(s.startIndex, offsetBy: ucOffset)
print(s[sIndex...]) // br>

The reverse calculation is done with distance(from:to:). Example:

let s = "br>
if let sIndex = s.index(of:") {
let ucOffset = s.unicodeScalars.distance(from: s.startIndex, to: sIndex)
print(ucOffset) // 2
}

Building Unicode scalar String in Swift

You can pad your string up to 4 hexa digits (2 bytes UInt16), add \u prefix \uXXXX and use a string transform to convert your unicode hexa value to the corresponding character:

extension StringProtocol where Self: RangeReplaceableCollection {
func paddingToLeft(upTo lenght: Int = 4, using character: Character = "0") -> Self {
repeatElement(character, count: Swift.max(0,lenght-count)) + self
}
var decodingUnicodeCharacters: String { applyingTransform(.init("Hex-Any"), reverse: false) ?? "" }
}


let omegaHexadecimal: String = "3A9"
let omega = "\\u" + omegaHexadecimal.paddingToLeft() // "\\u03A9"

omega.decodingUnicodeCharacters // "Ω"

Swift convert 'Character' to 'Unicode.Scalar'

CharacterSet has an unfortunate name inherited from Objective C. In reality, it is a set of Unicode.Scalars, not of Characters (“extended grapheme clusters” in Unicode parlance). This is necessary, because while there is a finite set of Unicode scalars, there is an infinite number of possible grapheme clusters. For example, e + ◌̄ + ◌̄ + ◌̄ ... ad infinitum is still just one cluster. As such, it is impossible to exhaustively list all possible clusters, and it is often impossible to list the subset of them that has a particular property. Set operations such as those in the question must use scalars instead (or at least use definitions derived from the component scalars).

In Swift, Strings have a unicodeScalars property for operating on the string a the scalar level, and the property is directly mutable. That enables you to do things like this:

// Assuming...
var name: String = "..."

// ...then...
name.unicodeScalars.removeAll(where: { !CharacterSet.alphanumerics.contains($0) })

Converting Unicode in Swift

This answer suggests using the NSString method stringByFoldingWithOptions.

The Swift String class has a concept called a "view" which lets you operate on the string under different encodings. It's pretty neat, and there are some views that might help you.

If you're dealing with strings in Swift, read this excellent post by Mike Ash. He discusses the idea of what a string really is with great detail and has some helpful hints for Swift 2.

Is there a way to create a String from utf16 array in swift?

Update for Swift 2.1:

You can create a String from an array of UTF-16 characters
with the

public init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)

initializer. Example:

let str = "H€llo br>
// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]

// UTF16 array to string:
let str2 = String(utf16CodeUnits: utf16array, count: utf16array.count)
print(str2)
// H€llo br>

Previous answer:

There is nothing "built-in" (as far as I know), but you can use the UTF16 struct
which provides a decode() method:

extension String {

init?(utf16chars:[UInt16]) {
var str = ""
var generator = utf16chars.generate()
var utf16 : UTF16 = UTF16()
var done = false
while !done {
let r = utf16.decode(&generator)
switch (r) {
case .EmptyInput:
done = true
case let .Result(val):
str.append(Character(val))
case .Error:
return nil
}
}
self = str
}
}

Example:

let str = "H€llo br>
// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]

// UTF16 array to string:
if let str2 = String(utf16chars: utf16array) {
print(str2)
// Output: H€llo br>}

Slightly more generic, you could define a method that creates a string
from an array (or any sequence) of code points, using a given codec:

extension String {
init?<S : SequenceType, C : UnicodeCodecType where S.Generator.Element == C.CodeUnit>
(codeUnits : S, var codec : C) {
var str = ""
var generator = codeUnits.generate()
var done = false
while !done {
let r = codec.decode(&generator)
switch (r) {
case .EmptyInput:
done = true
case let .Result(val):
str.append(Character(val))
case .Error:
return nil
}
}
self = str
}
}

Then the conversion from UTF16 is done as

if let str2a = String(codeUnits: utf16array, codec: UTF16()) {
print(str2a)
}

Here is another possible solution. While the previous methods are "pure Swift", this one uses the Foundation framework and the automatic
bridging between NSString and Swift String:

extension String {

init?(utf16chars:[UInt16]) {
let data = NSData(bytes: utf16chars, length: utf16chars.count * sizeof(UInt16))
if let ns = NSString(data: data, encoding: NSUTF16LittleEndianStringEncoding) {
self = ns as String
} else {
return nil
}
}
}

Convert unichar to String?

var str:String = "#ffffff"
var unichar = str[str.startIndex]
var unicharString = "\(unichar)"
var containsHash = unicharString == "#"

How to get unicode code point(s) representation of character/string in Swift?

Generally, the unicodeScalars property of a String returns a collection
of its unicode scalar values. (A Unicode scalar value is any
Unicode code point except high-surrogate and low-surrogate code points.)

Example:

print(Array("Á".unicodeScalars))  // ["A", "\u{0301}"]
print(Array(".unicodeScalars)) // ["\u{0001F496}"]

Up to Swift 3 there is no way to access
the unicode scalar values of a Character directly, it has to be
converted to a String first (for the Swift 4 status, see below).

If you want to see all Unicode scalar values as hexadecimal numbers
then you can access the value property (which is a UInt32 number)
and format it according to your needs.

Example (using the U+NNNN notation for Unicode values):

extension String {
func getUnicodeCodePoints() -> [String] {
return unicodeScalars.map { "U+" + String($0.value, radix: 16, uppercase: true) }
}
}

extension Character {
func getUnicodeCodePoints() -> [String] {
return String(self).getUnicodeCodePoints()
}
}

print("A".getUnicodeCodePoints()) // ["U+41"]
print("Á".getUnicodeCodePoints()) // ["U+41", "U+301"]
print(".getUnicodeCodePoints()) // ["U+1F496"]
print("SWIFT".getUnicodeCodePoints()) // ["U+53", "U+57", "U+49", "U+46", "U+54"]
print(".getUnicodeCodePoints()) // ["U+1F1EF", "U+1F1F4"]

Update for Swift 4:

As of Swift 4, the unicodeScalars of a Character can be
accessed directly,
see SE-0178 Add unicodeScalars property to Character. This makes the conversion to a String
obsolete:

let c: Character = "br>print(Array(c.unicodeScalars)) // ["\u{0001F1EF}", "\u{0001F1F4}"]


Related Topics



Leave a reply



Submit