See Characters in an Nscharacterset (Swift)

See characters in an NSCharacterSet (Swift)

Based on another answer, here is a derived cleaner version in Swift 2.0/3.0:

extension NSCharacterSet {
var characters:[String] {
var chars = [String]()
for plane:UInt8 in 0...16 {
if self.hasMemberInPlane(plane) {
let p0 = UInt32(plane) << 16
let p1 = (UInt32(plane) + 1) << 16
for c:UTF32Char in p0..<p1 {
if self.longCharacterIsMember(c) {
var c1 = c.littleEndian
let s = NSString(bytes: &c1, length: 4, encoding: String.Encoding.utf32LittleEndian.rawValue)!
chars.append(String(s))
}
}
}
}
return chars
}
}

Usage:

let charset = NSCharacterSet.URLQueryAllowedCharacterSet()
print(charset.characters.joinWithSeparator(""))

How to know which CharacterSet contains a given character?

What you want is defined in the Unicode standard. It is referred to as Unicode General Categories. Each Unicode character is in a category.

The Unicode website provides a complete character list showing the character's code, category, and name. You can also find a complete list of Unicode categories as well.

The - is U+2D (HYPHEN-MINUS). It is listed as being in the "Pd" (punctuation) category.

If you look at the documentation for CharacterSet, you will see punctuationCharacters which is documented as:

Returns a character set containing the characters in Unicode General Category P*.

The "Pd" category is included in "P*" (which means any "P" category).

I also found https://www.compart.com/en/unicode/category which is a third party list of each character by category. A bit more user friendly than the Unicode reference.

To summarize. If you want to know which CharacterSet to use for a given character, lookup the character's category using one of the charts I linked. Once you know its category, look at the documentation for CharacterSet to see which predefined character set applies to that category.

How to print a content of the CharacterSet.decimalDigits?

This is not easy. Character sets are not made to be iterated, they are made to check whether a character is inside them or not. They don't contain the characters themselves and the ranges cannot be accessed.

The only thing you can do is to iterate over all characters and check every one of them against the character set, e.g.:

let set = CharacterSet.decimalDigits
let allCharacters = UInt32.min ... UInt32.max

allCharacters
.lazy
.compactMap { UnicodeScalar($0) }
.filter { set.contains($0) }
.map { String($0) }
.forEach { print($0) }

However, note that such a thing takes significant time and shouldn't be used inside a production application.

How can I check if a string contains letters in Swift?

You can use NSCharacterSet in the following way :

let letters = NSCharacterSet.letters

let phrase = "Test case"
let range = phrase.rangeOfCharacter(from: characterSet)

// range will be nil if no letters is found
if let test = range {
println("letters found")
}
else {
println("letters not found")
}

Or you can do this too :

func containsOnlyLetters(input: String) -> Bool {
for chr in input {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}

In Swift 2:

func containsOnlyLetters(input: String) -> Bool {
for chr in input.characters {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}

It's up to you, choose a way. I hope this help you.

NSArray from NSCharacterSet

The following code creates an array containing all characters of a given character set. It works also for characters outside of the "basic multilingual plane" (characters > U+FFFF, e.g. U+10400 DESERET CAPITAL LETTER LONG I).

NSCharacterSet *charset = [NSCharacterSet uppercaseLetterCharacterSet];
NSMutableArray *array = [NSMutableArray array];
for (int plane = 0; plane <= 16; plane++) {
if ([charset hasMemberInPlane:plane]) {
UTF32Char c;
for (c = plane << 16; c < (plane+1) << 16; c++) {
if ([charset longCharacterIsMember:c]) {
UTF32Char c1 = OSSwapHostToLittleInt32(c); // To make it byte-order safe
NSString *s = [[NSString alloc] initWithBytes:&c1 length:4 encoding:NSUTF32LittleEndianStringEncoding];
[array addObject:s];
}
}
}
}

For the uppercaseLetterCharacterSet this gives an array of 1467 elements. But note that characters > U+FFFF are stored as UTF-16 surrogate pair in NSString, so for example U+10400 actually is stored in NSString as 2 characters "\uD801\uDC00".

Swift 2 code can be found in other answers to this question.
Here is a Swift 3 version, written as an extension method:

extension CharacterSet {
func allCharacters() -> [Character] {
var result: [Character] = []
for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
result.append(Character(uniChar))
}
}
}
return result
}
}

Example:

let charset = CharacterSet.uppercaseLetters
let chars = charset.allCharacters()
print(chars.count) // 1521
print(chars) // ["A", "B", "C", ... "]

(Note that some characters may not be present in the font used to
display the result.)

NSCharacterSet.characterIsMember() with Swift's Character type

My understanding is that unichar is a typealias for UInt16. A unichar is just a number.

I think that the problem that you are facing is that a Character in Swift can be composed of more than one unicode "characters". Thus, it cannot be converted to a single unichar value because it may be composed of two unichars. You can decompose a Character into its individual unichar values by casting it to a string and using the utf16 property, like this:

let c: Character = "a"
let s = String(c)
var codeUnits = [unichar]()
for codeUnit in s.utf16 {
codeUnits.append(codeUnit)
}

This will produce an array - codeUnits - of unichar values.

EDIT: Initial code had for codeUnit in s when it should have been for codeUnit in s.utf16

You can tidy things up and test for whether or not each individual unichar value is in a character set like this:

let char: Character = "\u{63}\u{20dd}" // This is a 'c' inside of an enclosing circle
for codeUnit in String(char).utf16 {
if NSCharacterSet(charactersInString: "c").characterIsMember(codeUnit) {
dude.abide()
} // dude will abide() for codeUnits[0] = "c", but not for codeUnits[1] = 0x20dd (the enclosing circle)
}

Or, if you are only interested in the first (and often only) unichar value:

if NSCharacterSet(charactersInString: "c").characterIsMember(String(char).utf16[0]) {
dude.abide()
}

Or, wrap it in a function:

func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
return set.characterIsMember(String(char).utf16[0])
}

let xSet = NSCharacterSet(charactersInString: "x")
isChar("x", inSet: xSet) // This returns true
isChar("y", inSet: xSet) // This returns false

Now make the function check for all unichar values in a composed character - that way, if you have a composed character, the function will only return true if both the base character and the combining character are present:

func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
var found = true
for ch in String(char).utf16 {
if !set.characterIsMember(ch) { found = false }
}
return found
}

let acuteA: Character = "\u{e1}" // An "a" with an accent
let acuteAComposed: Character = "\u{61}\u{301}" // Also an "a" with an accent

// A character set that includes both the composed and uncomposed unichar values
let charSet = NSCharacterSet(charactersInString: "\u{61}\u{301}\u{e1}")

isChar(acuteA, inSet: charSet) // returns true
isChar(acuteAComposed, inSet: charSet) // returns true (both unichar values were matched

The last version is important. If your Character is a composed character you have to check for the presence of both the base character ("a") and the combining character (the acute accent) in the character set or you will get false positives.

Check if string contains special characters in Swift

Your code check if no character in the string is from the given set.
What you want is to check if any character is not in the given set:

if (searchTerm!.rangeOfCharacterFromSet(characterSet.invertedSet).location != NSNotFound){
println("Could not handle special characters")
}

You can also achieve this using regular expressions:

let regex = NSRegularExpression(pattern: ".*[^A-Za-z0-9].*", options: nil, error: nil)!
if regex.firstMatchInString(searchTerm!, options: nil, range: NSMakeRange(0, searchTerm!.length)) != nil {
println("could not handle special characters")

}

The pattern [^A-Za-z0-9] matches a character which is not from the ranges A-Z,
a-z, or 0-9.

Update for Swift 2:

let searchTerm = "a+b"

let characterset = NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacterFromSet(characterset.invertedSet) != nil {
print("string contains special characters")
}

Update for Swift 3:

let characterset = CharacterSet(charactersIn: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacter(from: characterset.inverted) != nil {
print("string contains special characters")
}

How to create a custom NSCharacterSet?

The most typical way to create a new character set is using
CharacterSet(charactersIn:), giving a String with all the characters of the set.

Adding some characters to an existing set can be achieved using:

let characterSet = NSMutableCharacterSet() //create an empty mutable set
characterSet.formUnionWithCharacterSet(NSCharacterSet.URLQueryAllowedCharacterSet())
characterSet.addCharactersInString("?&")

or in Swift 3+ simply:

var characterSet = CharacterSet.urlQueryAllowed
characterSet.insert(charactersIn: "?&")

For URL encoding, also note Objective-C and Swift URL encoding

Remove all non-numeric characters from a string in swift

I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep.

You can either use trimmingCharacters with the inverted character set to remove characters from the start or the end of the string. In Swift 3 and later:

let result = string.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)

Or, if you want to remove non-numeric characters anywhere in the string (not just the start or end), you can filter the characters, e.g. in Swift 4.2.1:

let result = string.filter("0123456789.".contains)

Or, if you want to remove characters from a CharacterSet from anywhere in the string, use:

let result = String(string.unicodeScalars.filter(CharacterSet.whitespaces.inverted.contains))

Or, if you want to only match valid strings of a certain format (e.g. ####.##), you could use regular expression. For example:

if let range = string.range(of: #"\d+(\.\d*)?"#, options: .regularExpression) {
let result = string[range] // or `String(string[range])` if you need `String`
}

The behavior of these different approaches differ slightly so it just depends on precisely what you're trying to do. Include or exclude the decimal point if you want decimal numbers, or just integers. There are lots of ways to accomplish this.


For older, Swift 2 syntax, see previous revision of this answer.



Related Topics



Leave a reply



Submit