Convert Array of UnicodeScalar into String in Swift
The second case is simpler because array2
is a UnicodeScalarView
and not an array:
let array2 = "bar".unicodeScalars
let str2 = String(array2)
print(str2) // bar
If you have an array (or any sequence) of Unicode scalars then you can start with an empty string
and append the elements to its unicodeScalars
view:
let array = [UnicodeScalar("f")!, UnicodeScalar("o")!, UnicodeScalar("o")!]
// Or: let array: [UnicodeScalar] = ["f", "o", "o"]
var str1 = ""
str1.unicodeScalars.append(contentsOf: array)
print(str1) // foo
Of course you can define a custom extension for that purpose:
extension String {
init<S: Sequence>(unicodeScalars ucs: S)
where S.Iterator.Element == UnicodeScalar
{
var s = ""
s.unicodeScalars.append(contentsOf: ucs)
self = s
}
}
let str1 = String(unicodeScalars: array)
Convert array of unicode code points to string
You can convert each of your hexa string to UInt32
, initialise an Unicode.Scalar
for each element and create a String
UnicodeScalarView
from it:
let arr = ["0023", "FE0F", "20E3"]
let values = arr.compactMap{ UInt32($0, radix: 16) }
let unicodeScalars = values.compactMap(Unicode.Scalar.init)
let string = String(String.UnicodeScalarView(unicodeScalars))
Which can be also be written as a one liner:
let arr = ["0023", "FE0F", "20E3"]
let string = String(String.UnicodeScalarView(arr.compactMap{ UInt32($0, radix: 16) }.compactMap(Unicode.Scalar.init)))
edit/update:
If all your strings can be represented by UInt16 values you can also use String initializer init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)
as shown by @MartinR here:
let arr = ["0023", "FE0F", "20E3"]
let values = arr.compactMap { UInt16($0, radix: 16) }
let string = String(utf16CodeUnits: values, count: values.count) // "#️⃣"
Convert UnicodeScalar index to String.Index
The various String views share a common index. If you have a position given as an offset into the UnicodeScalar view then use String.unicodeScalars.index()
to convert it to a String.Index
. Example:
let s = "br>print(Array(s.unicodeScalars))
// ["\u{0001F1E6}", "\u{0001F1F9}", "\u{0001F1E7}", "\u{0001F1EA}"]
let ucOffset = 2
let sIndex = s.unicodeScalars.index(s.startIndex, offsetBy: ucOffset)
print(s[sIndex...]) // br>
The reverse calculation is done with distance(from:to:)
. Example:
let s = "br>
if let sIndex = s.index(of:") {
let ucOffset = s.unicodeScalars.distance(from: s.startIndex, to: sIndex)
print(ucOffset) // 2
}
Building Unicode scalar String in Swift
You can pad your string up to 4 hexa digits (2 bytes UInt16), add \u
prefix \uXXXX
and use a string transform to convert your unicode hexa value to the corresponding character:
extension StringProtocol where Self: RangeReplaceableCollection {
func paddingToLeft(upTo lenght: Int = 4, using character: Character = "0") -> Self {
repeatElement(character, count: Swift.max(0,lenght-count)) + self
}
var decodingUnicodeCharacters: String { applyingTransform(.init("Hex-Any"), reverse: false) ?? "" }
}
let omegaHexadecimal: String = "3A9"
let omega = "\\u" + omegaHexadecimal.paddingToLeft() // "\\u03A9"
omega.decodingUnicodeCharacters // "Ω"
Swift convert 'Character' to 'Unicode.Scalar'
CharacterSet
has an unfortunate name inherited from Objective C. In reality, it is a set of Unicode.Scalar
s, not of Characters
(“extended grapheme clusters” in Unicode parlance). This is necessary, because while there is a finite set of Unicode scalars, there is an infinite number of possible grapheme clusters. For example, e + ◌̄ + ◌̄ + ◌̄ ...
ad infinitum is still just one cluster. As such, it is impossible to exhaustively list all possible clusters, and it is often impossible to list the subset of them that has a particular property. Set operations such as those in the question must use scalars instead (or at least use definitions derived from the component scalars).
In Swift, String
s have a unicodeScalars
property for operating on the string a the scalar level, and the property is directly mutable. That enables you to do things like this:
// Assuming...
var name: String = "..."
// ...then...
name.unicodeScalars.removeAll(where: { !CharacterSet.alphanumerics.contains($0) })
Converting Unicode in Swift
This answer suggests using the NSString method stringByFoldingWithOptions
.
The Swift String
class has a concept called a "view" which lets you operate on the string under different encodings. It's pretty neat, and there are some views that might help you.
If you're dealing with strings in Swift, read this excellent post by Mike Ash. He discusses the idea of what a string really is with great detail and has some helpful hints for Swift 2.
Is there a way to create a String from utf16 array in swift?
Update for Swift 2.1:
You can create a String
from an array of UTF-16 characters
with the
public init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)
initializer. Example:
let str = "H€llo br>
// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]
// UTF16 array to string:
let str2 = String(utf16CodeUnits: utf16array, count: utf16array.count)
print(str2)
// H€llo br>
Previous answer:
There is nothing "built-in" (as far as I know), but you can use the UTF16
struct
which provides a decode()
method:
extension String {
init?(utf16chars:[UInt16]) {
var str = ""
var generator = utf16chars.generate()
var utf16 : UTF16 = UTF16()
var done = false
while !done {
let r = utf16.decode(&generator)
switch (r) {
case .EmptyInput:
done = true
case let .Result(val):
str.append(Character(val))
case .Error:
return nil
}
}
self = str
}
}
Example:
let str = "H€llo br>
// String to UTF16 array:
let utf16array = Array(str.utf16)
print(utf16array)
// Output: [72, 8364, 108, 108, 111, 32, 55357, 56836]
// UTF16 array to string:
if let str2 = String(utf16chars: utf16array) {
print(str2)
// Output: H€llo br>}
Slightly more generic, you could define a method that creates a string
from an array (or any sequence) of code points, using a given codec:
extension String {
init?<S : SequenceType, C : UnicodeCodecType where S.Generator.Element == C.CodeUnit>
(codeUnits : S, var codec : C) {
var str = ""
var generator = codeUnits.generate()
var done = false
while !done {
let r = codec.decode(&generator)
switch (r) {
case .EmptyInput:
done = true
case let .Result(val):
str.append(Character(val))
case .Error:
return nil
}
}
self = str
}
}
Then the conversion from UTF16 is done as
if let str2a = String(codeUnits: utf16array, codec: UTF16()) {
print(str2a)
}
Here is another possible solution. While the previous methods are "pure Swift", this one uses the Foundation framework and the automatic
bridging between NSString
and Swift String
:
extension String {
init?(utf16chars:[UInt16]) {
let data = NSData(bytes: utf16chars, length: utf16chars.count * sizeof(UInt16))
if let ns = NSString(data: data, encoding: NSUTF16LittleEndianStringEncoding) {
self = ns as String
} else {
return nil
}
}
}
Convert unichar to String?
var str:String = "#ffffff"
var unichar = str[str.startIndex]
var unicharString = "\(unichar)"
var containsHash = unicharString == "#"
How to get unicode code point(s) representation of character/string in Swift?
Generally, the unicodeScalars
property of a String
returns a collection
of its unicode scalar values. (A Unicode scalar value is any
Unicode code point except high-surrogate and low-surrogate code points.)
Example:
print(Array("Á".unicodeScalars)) // ["A", "\u{0301}"]
print(Array(".unicodeScalars)) // ["\u{0001F496}"]
Up to Swift 3 there is no way to access
the unicode scalar values of a Character
directly, it has to be
converted to a String
first (for the Swift 4 status, see below).
If you want to see all Unicode scalar values as hexadecimal numbers
then you can access the value
property (which is a UInt32
number)
and format it according to your needs.
Example (using the U+NNNN
notation for Unicode values):
extension String {
func getUnicodeCodePoints() -> [String] {
return unicodeScalars.map { "U+" + String($0.value, radix: 16, uppercase: true) }
}
}
extension Character {
func getUnicodeCodePoints() -> [String] {
return String(self).getUnicodeCodePoints()
}
}
print("A".getUnicodeCodePoints()) // ["U+41"]
print("Á".getUnicodeCodePoints()) // ["U+41", "U+301"]
print(".getUnicodeCodePoints()) // ["U+1F496"]
print("SWIFT".getUnicodeCodePoints()) // ["U+53", "U+57", "U+49", "U+46", "U+54"]
print(".getUnicodeCodePoints()) // ["U+1F1EF", "U+1F1F4"]
Update for Swift 4:
As of Swift 4, the unicodeScalars
of a Character
can be
accessed directly,
see SE-0178 Add unicodeScalars property to Character. This makes the conversion to a String
obsolete:
let c: Character = "br>print(Array(c.unicodeScalars)) // ["\u{0001F1EF}", "\u{0001F1F4}"]
Related Topics
Scenekit - Get Direction of Camera
Value for Swift_Version Cannot Be Empty
Class Level or Struct Level Method in Swift Like Static Method in Java
Swiftui Views with a Custom Init
How to Apply Shadow to Interior Views in Swiftui
Code Migration from Swift 2.X to Swift 4
Dispatch_Once_T' Is Unavailable in Swift: Use Lazily Initialized Globals Instead
Unknown Selected Data Source for Port Speaker (Type: Speaker)
Reducing the Number of Brackets in Swift
What's the Difference Between a Required Initializer and a Designated Initializer
Convincing Swift That a Function Will Never Return, Due to a Thrown Exception
Swift Optional Escaping Closure
Add Links to Swift Classes in the Quick Help Documentation Comments