How to Initialize a Unichar Variable in Swift

How to initialize UniChar with too many bytes in Swift?

The function expects (the pointer to) an array of UniChar aka UInt16,
containing the UTF-16 representation of the string.
As @rmaddy said, UniChar can hold only values up to 0xFFFF.
Larger Unicode scalars need to be represented as “surrogate pairs”.

The .utf16 view of a string provides the UTF-16 representation:

let c = "\u{1F63E}" // Or: let c = "br>let utf16Chars = Array(c.utf16)
event.keyboardSetUnicodeString(stringLength: utf16Chars.count, unicodeString: utf16Chars)

How to set value of (initialize) unichar/UInt16 in hexadecimal in Swift

If you had read the The Basics of the Swift documentation a little more carefully you would have found it.

Integer literals can be written as:

  • A decimal number, with no prefix
  • A binary number, with a 0b prefix
  • An octal number, with a 0o prefix
  • A hexadecimal number, with a 0x prefix

All of these integer literals have a decimal value of 17:

let decimalInteger = 17  
let binaryInteger = 0b10001 // 17 in binary notation
let octalInteger = 0o21 // 17 in octal notation
let hexadecimalInteger = 0x11 // 17 in hexadecimal notation

So you would do

let myValue: UInt16 = 0x1820

or

let myValue: unichar = 0x1820 // unichar is a type alias of UInt16

Unichar variable in Swift

In Swift, the character literals are of type Character.

let exclamationMark: Character = "!"

See Strings and Characters

For advanced readers:

You can also extend unichar with the capability to accept character literals:

extension unichar : UnicodeScalarLiteralConvertible {
public typealias UnicodeScalarLiteralType = UnicodeScalar

public init(unicodeScalarLiteral scalar: UnicodeScalar) {
self.init(scalar.value)
}
}

let newLine: unichar = "\n"
let whitespaces = NSCharacterSet.whitespaceAndNewlineCharacterSet()

print("C is whitespace \(whitespaces.characterIsMember(newLine))")

However, note that Swift literals use 4 bytes while unichar uses only 2 bytes, therefore some characters will be truncated when converted to unichar.

How to convert unicode hex number variable to character in NSString?

You can initialize the string with the a buffer of bytes of the hex (you simply provide its pointer). The point is, and the important thing to notice is that you provide the character encoding to be applied. Specifically you should notice the byte order.

Here's an example:

UInt32 hex = 0x095f;
NSString *unicodeString = [[NSString alloc] initWithBytes:&hex length:sizeof(hex) encoding:NSUTF32LittleEndianStringEncoding];

Note that solutions like using the %C format are fine as long as you use them for 16-bit unicode characters; 32-bit unicode characters like emojis (for example: 0x1f601, 0x1f41a) will not work using simple formatting.

Working with Unicode code points in Swift

Updated for Swift 3

String and Character

For almost everyone in the future who visits this question, String and Character will be the answer for you.

Set Unicode values directly in code:

var str: String = "I want to visit 北京, Москва, मुंबई, القاهرة, and 서울시. quot;
var character: Character = "quot;

Use hexadecimal to set values

var str: String = "\u{61}\u{5927}\u{1F34E}\u{3C0}" // a大π
var character: Character = "\u{65}\u{301}" // é = "e" + accent mark

Note that the Swift Character can be composed of multiple Unicode code points, but appears to be a single character. This is called an Extended Grapheme Cluster.

See this question also.

Convert to Unicode values:

str.utf8
str.utf16
str.unicodeScalars // UTF-32

String(character).utf8
String(character).utf16
String(character).unicodeScalars

Convert from Unicode hex values:

let hexValue: UInt32 = 0x1F34E

// convert hex value to UnicodeScalar
guard let scalarValue = UnicodeScalar(hexValue) else {
// early exit if hex does not form a valid unicode value
return
}

// convert UnicodeScalar to String
let myString = String(scalarValue) // br>

Or alternatively:

let hexValue: UInt32 = 0x1F34E
if let scalarValue = UnicodeScalar(hexValue) {
let myString = String(scalarValue)
}

A few more examples

let value0: UInt8 = 0x61
let value1: UInt16 = 0x5927
let value2: UInt32 = 0x1F34E

let string0 = String(UnicodeScalar(value0)) // a
let string1 = String(UnicodeScalar(value1)) // 大
let string2 = String(UnicodeScalar(value2)) // br>
// convert hex array to String
let myHexArray = [0x43, 0x61, 0x74, 0x203C, 0x1F431] // an Int array
var myString = ""
for hexValue in myHexArray {
myString.append(UnicodeScalar(hexValue))
}
print(myString) // Cat‼br>

Note that for UTF-8 and UTF-16 the conversion is not always this easy. (See UTF-8, UTF-16, and UTF-32 questions.)

NSString and unichar

It is also possible to work with NSString and unichar in Swift, but you should realize that unless you are familiar with Objective C and good at converting the syntax to Swift, it will be difficult to find good documentation.

Also, unichar is a UInt16 array and as mentioned above the conversion from UInt16 to Unicode scalar values is not always easy (i.e., converting surrogate pairs for things like emoji and other characters in the upper code planes).

Custom string structure

For the reasons mentioned in the question, I ended up not using any of the above methods. Instead I wrote my own string structure, which was basically an array of UInt32 to hold Unicode scalar values.

Again, this is not the solution for most people. First consider using extensions if you only need to extend the functionality of String or Character a little.

But if you really need to work exclusively with Unicode scalar values, you could write a custom struct.

The advantages are:

  • Don't need to constantly switch between Types (String, Character, UnicodeScalar, UInt32, etc.) when doing string manipulation.
  • After Unicode manipulation is finished, the final conversion to String is easy.
  • Easy to add more methods when they are needed
  • Simplifies converting code from Java or other languages

Disadavantages are:

  • makes code less portable and less readable for other Swift developers
  • not as well tested and optimized as the native Swift types
  • it is yet another file that has to be included in a project every time you need it

You can make your own, but here is mine for reference. The hardest part was making it Hashable.

// This struct is an array of UInt32 to hold Unicode scalar values
// Version 3.4.0 (Swift 3 update)

struct ScalarString: Sequence, Hashable, CustomStringConvertible {

fileprivate var scalarArray: [UInt32] = []


init() {
// does anything need to go here?
}

init(_ character: UInt32) {
self.scalarArray.append(character)
}

init(_ charArray: [UInt32]) {
for c in charArray {
self.scalarArray.append(c)
}
}

init(_ string: String) {

for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// Generator in order to conform to SequenceType protocol
// (to allow users to iterate as in `for myScalarValue in myScalarString` { ... })
func makeIterator() -> AnyIterator<UInt32> {
return AnyIterator(scalarArray.makeIterator())
}

// append
mutating func append(_ scalar: UInt32) {
self.scalarArray.append(scalar)
}

mutating func append(_ scalarString: ScalarString) {
for scalar in scalarString {
self.scalarArray.append(scalar)
}
}

mutating func append(_ string: String) {
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// charAt
func charAt(_ index: Int) -> UInt32 {
return self.scalarArray[index]
}

// clear
mutating func clear() {
self.scalarArray.removeAll(keepingCapacity: true)
}

// contains
func contains(_ character: UInt32) -> Bool {
for scalar in self.scalarArray {
if scalar == character {
return true
}
}
return false
}

// description (to implement Printable protocol)
var description: String {
return self.toString()
}

// endsWith
func endsWith() -> UInt32? {
return self.scalarArray.last
}

// indexOf
// returns first index of scalar string match
func indexOf(_ string: ScalarString) -> Int? {

if scalarArray.count < string.length {
return nil
}

for i in 0...(scalarArray.count - string.length) {

for j in 0..<string.length {

if string.charAt(j) != scalarArray[i + j] {
break // substring mismatch
}
if j == string.length - 1 {
return i
}
}
}

return nil
}

// insert
mutating func insert(_ scalar: UInt32, atIndex index: Int) {
self.scalarArray.insert(scalar, at: index)
}
mutating func insert(_ string: ScalarString, atIndex index: Int) {
var newIndex = index
for scalar in string {
self.scalarArray.insert(scalar, at: newIndex)
newIndex += 1
}
}
mutating func insert(_ string: String, atIndex index: Int) {
var newIndex = index
for scalar in string.unicodeScalars {
self.scalarArray.insert(scalar.value, at: newIndex)
newIndex += 1
}
}

// isEmpty
var isEmpty: Bool {
return self.scalarArray.count == 0
}

// hashValue (to implement Hashable protocol)
var hashValue: Int {

// DJB Hash Function
return self.scalarArray.reduce(5381) {
($0 << 5) &+ $0 &+ Int($1)
}
}

// length
var length: Int {
return self.scalarArray.count
}

// remove character
mutating func removeCharAt(_ index: Int) {
self.scalarArray.remove(at: index)
}
func removingAllInstancesOfChar(_ character: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar != character {
returnString.append(scalar)
}
}

return returnString
}
func removeRange(_ range: CountableRange<Int>) -> ScalarString? {

if range.lowerBound < 0 || range.upperBound > scalarArray.count {
return nil
}

var returnString = ScalarString()

for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
}
}

return returnString
}


// replace
func replace(_ character: UInt32, withChar replacementChar: UInt32) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementChar)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replace(_ character: UInt32, withString replacementString: String) -> ScalarString {

var returnString = ScalarString()

for scalar in self.scalarArray {
if scalar == character {
returnString.append(replacementString)
} else {
returnString.append(scalar)
}
}
return returnString
}
func replaceRange(_ range: CountableRange<Int>, withString replacementString: ScalarString) -> ScalarString {

var returnString = ScalarString()

for i in 0..<scalarArray.count {
if i < range.lowerBound || i >= range.upperBound {
returnString.append(scalarArray[i])
} else if i == range.lowerBound {
returnString.append(replacementString)
}
}
return returnString
}

// set (an alternative to myScalarString = "some string")
mutating func set(_ string: String) {
self.scalarArray.removeAll(keepingCapacity: false)
for s in string.unicodeScalars {
self.scalarArray.append(s.value)
}
}

// split
func split(atChar splitChar: UInt32) -> [ScalarString] {
var partsArray: [ScalarString] = []
if self.scalarArray.count == 0 {
return partsArray
}
var part: ScalarString = ScalarString()
for scalar in self.scalarArray {
if scalar == splitChar {
partsArray.append(part)
part = ScalarString()
} else {
part.append(scalar)
}
}
partsArray.append(part)
return partsArray
}

// startsWith
func startsWith() -> UInt32? {
return self.scalarArray.first
}

// substring
func substring(_ startIndex: Int) -> ScalarString {
// from startIndex to end of string
var subArray: ScalarString = ScalarString()
for i in startIndex..<self.length {
subArray.append(self.scalarArray[i])
}
return subArray
}
func substring(_ startIndex: Int, _ endIndex: Int) -> ScalarString {
// (startIndex is inclusive, endIndex is exclusive)
var subArray: ScalarString = ScalarString()
for i in startIndex..<endIndex {
subArray.append(self.scalarArray[i])
}
return subArray
}

// toString
func toString() -> String {
var string: String = ""

for scalar in self.scalarArray {
if let validScalor = UnicodeScalar(scalar) {
string.append(Character(validScalor))
}
}
return string
}

// trim
// removes leading and trailing whitespace (space, tab, newline)
func trim() -> ScalarString {

//var returnString = ScalarString()
let space: UInt32 = 0x00000020
let tab: UInt32 = 0x00000009
let newline: UInt32 = 0x0000000A

var startIndex = self.scalarArray.count
var endIndex = 0

// leading whitespace
for i in 0..<self.scalarArray.count {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {

startIndex = i
break
}
}

// trailing whitespace
for i in stride(from: (self.scalarArray.count - 1), through: 0, by: -1) {
if self.scalarArray[i] != space &&
self.scalarArray[i] != tab &&
self.scalarArray[i] != newline {

endIndex = i + 1
break
}
}

if endIndex <= startIndex {
return ScalarString()
}

return self.substring(startIndex, endIndex)
}

// values
func values() -> [UInt32] {
return self.scalarArray
}

}

func ==(left: ScalarString, right: ScalarString) -> Bool {
return left.scalarArray == right.scalarArray
}

func +(left: ScalarString, right: ScalarString) -> ScalarString {
var returnString = ScalarString()
for scalar in left.values() {
returnString.append(scalar)
}
for scalar in right.values() {
returnString.append(scalar)
}
return returnString
}

combining multiple unicode character

The string "\u{0C95}\u{0CBE}" is interpreted by the Swift compiler, which unfortunately means you can't build it up using string interpolation which is a runtime operation. So you need to look for another approach.

String has an initializer which takes UTF16 code units:

/// Returns an initialized `String` object that contains a
/// given number of characters from a given array of Unicode
/// characters.
public init(utf16CodeUnits: UnsafePointer<unichar>, count: Int)

When you see something taking an UnsafePointer of a type T, you can call that function with [T]. Since the initializer needs the count as well, just build an Swift array of unichar and pass that array and its count to the constructor:

let point1: unichar = 0x0C95
let point2: unichar = 0x0CBE

let units = [point1, point2]
let str = String(utf16CodeUnits: units, count: units.count)
let combinedChar = Character(str)

Get nth character of a string in Swift

Attention: Please see Leo Dabus' answer for a proper implementation for Swift 4 and Swift 5.

Swift 4 or later

The Substring type was introduced in Swift 4 to make substrings
faster and more efficient by sharing storage with the original string, so that's what the subscript functions should return.

Try it out here

extension StringProtocol {
subscript(offset: Int) -> Character { self[index(startIndex, offsetBy: offset)] }
subscript(range: Range<Int>) -> SubSequence {
let startIndex = index(self.startIndex, offsetBy: range.lowerBound)
return self[startIndex..<index(startIndex, offsetBy: range.count)]
}
subscript(range: ClosedRange<Int>) -> SubSequence {
let startIndex = index(self.startIndex, offsetBy: range.lowerBound)
return self[startIndex..<index(startIndex, offsetBy: range.count)]
}
subscript(range: PartialRangeFrom<Int>) -> SubSequence { self[index(startIndex, offsetBy: range.lowerBound)...] }
subscript(range: PartialRangeThrough<Int>) -> SubSequence { self[...index(startIndex, offsetBy: range.upperBound)] }
subscript(range: PartialRangeUpTo<Int>) -> SubSequence { self[..<index(startIndex, offsetBy: range.upperBound)] }
}

To convert the Substring into a String, you can simply
do String(string[0..2]), but you should only do that if
you plan to keep the substring around. Otherwise, it's more
efficient to keep it a Substring.

It would be great if someone could figure out a good way to merge
these two extensions into one. I tried extending StringProtocol
without success, because the index method does not exist there.
Note: This answer has been already edited, it is properly implemented and now works for substrings as well. Just make sure to use a valid range to avoid crashing when subscripting your StringProtocol type. For subscripting with a range that won't crash with out of range values you can use this implementation


Why is this not built-in?

The error message says "see the documentation comment for discussion". Apple provides the following explanation in the file UnavailableStringAPIs.swift:

Subscripting strings with integers is not available.

The concept of "the ith character in a string" has
different interpretations in different libraries and system
components. The correct interpretation should be selected
according to the use case and the APIs involved, so String
cannot be subscripted with an integer.

Swift provides several different ways to access the character
data stored inside strings.

  • String.utf8 is a collection of UTF-8 code units in the
    string. Use this API when converting the string to UTF-8.
    Most POSIX APIs process strings in terms of UTF-8 code units.

  • String.utf16 is a collection of UTF-16 code units in
    string. Most Cocoa and Cocoa touch APIs process strings in
    terms of UTF-16 code units. For example, instances of
    NSRange used with NSAttributedString and
    NSRegularExpression store substring offsets and lengths in
    terms of UTF-16 code units.

  • String.unicodeScalars is a collection of Unicode scalars.
    Use this API when you are performing low-level manipulation
    of character data.

  • String.characters is a collection of extended grapheme
    clusters, which are an approximation of user-perceived
    characters.


Note that when processing strings that contain human-readable text,
character-by-character processing should be avoided to the largest extent
possible. Use high-level locale-sensitive Unicode algorithms instead, for example,
String.localizedStandardCompare(),
String.localizedLowercaseString,
String.localizedStandardRangeOfString() etc.

How to perform some arithmetic operations on binary bit number?

Swift allows you to write integer literals in different bases:

  • Binary: 0b100101
  • Octal: 0o47
  • Hexadecimal: 0xd4f
  • Decimal: 3456 (no prefix)

This is covered in the Numeric Literals section of the Swift book.

So your binary number would be written as:

let numb = 0b100100010000001010001

Or you can create the Int from a string using a radix:

let numb = Int("100100010000001010001", radix: 2)


Related Topics



Leave a reply



Submit