See characters in an NSCharacterSet (Swift)
Based on another answer, here is a derived cleaner version in Swift 2.0/3.0:
extension NSCharacterSet {
var characters:[String] {
var chars = [String]()
for plane:UInt8 in 0...16 {
if self.hasMemberInPlane(plane) {
let p0 = UInt32(plane) << 16
let p1 = (UInt32(plane) + 1) << 16
for c:UTF32Char in p0..<p1 {
if self.longCharacterIsMember(c) {
var c1 = c.littleEndian
let s = NSString(bytes: &c1, length: 4, encoding: String.Encoding.utf32LittleEndian.rawValue)!
chars.append(String(s))
}
}
}
}
return chars
}
}
Usage:
let charset = NSCharacterSet.URLQueryAllowedCharacterSet()
print(charset.characters.joinWithSeparator(""))
How to know which CharacterSet contains a given character?
What you want is defined in the Unicode standard. It is referred to as Unicode General Categories. Each Unicode character is in a category.
The Unicode website provides a complete character list showing the character's code, category, and name. You can also find a complete list of Unicode categories as well.
The -
is U+2D (HYPHEN-MINUS). It is listed as being in the "Pd" (punctuation) category.
If you look at the documentation for CharacterSet
, you will see punctuationCharacters
which is documented as:
Returns a character set containing the characters in Unicode General Category P*.
The "Pd" category is included in "P*" (which means any "P" category).
I also found https://www.compart.com/en/unicode/category which is a third party list of each character by category. A bit more user friendly than the Unicode reference.
To summarize. If you want to know which CharacterSet
to use for a given character, lookup the character's category using one of the charts I linked. Once you know its category, look at the documentation for CharacterSet
to see which predefined character set applies to that category.
How to print a content of the CharacterSet.decimalDigits?
This is not easy. Character sets are not made to be iterated, they are made to check whether a character is inside them or not. They don't contain the characters themselves and the ranges cannot be accessed.
The only thing you can do is to iterate over all characters and check every one of them against the character set, e.g.:
let set = CharacterSet.decimalDigits
let allCharacters = UInt32.min ... UInt32.max
allCharacters
.lazy
.compactMap { UnicodeScalar($0) }
.filter { set.contains($0) }
.map { String($0) }
.forEach { print($0) }
However, note that such a thing takes significant time and shouldn't be used inside a production application.
How can I check if a string contains letters in Swift?
You can use NSCharacterSet
in the following way :
let letters = NSCharacterSet.letters
let phrase = "Test case"
let range = phrase.rangeOfCharacter(from: characterSet)
// range will be nil if no letters is found
if let test = range {
println("letters found")
}
else {
println("letters not found")
}
Or you can do this too :
func containsOnlyLetters(input: String) -> Bool {
for chr in input {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}
In Swift 2:
func containsOnlyLetters(input: String) -> Bool {
for chr in input.characters {
if (!(chr >= "a" && chr <= "z") && !(chr >= "A" && chr <= "Z") ) {
return false
}
}
return true
}
It's up to you, choose a way. I hope this help you.
NSArray from NSCharacterSet
The following code creates an array containing all characters of a given character set. It works also for characters outside of the "basic multilingual plane" (characters > U+FFFF, e.g. U+10400 DESERET CAPITAL LETTER LONG I).
NSCharacterSet *charset = [NSCharacterSet uppercaseLetterCharacterSet];
NSMutableArray *array = [NSMutableArray array];
for (int plane = 0; plane <= 16; plane++) {
if ([charset hasMemberInPlane:plane]) {
UTF32Char c;
for (c = plane << 16; c < (plane+1) << 16; c++) {
if ([charset longCharacterIsMember:c]) {
UTF32Char c1 = OSSwapHostToLittleInt32(c); // To make it byte-order safe
NSString *s = [[NSString alloc] initWithBytes:&c1 length:4 encoding:NSUTF32LittleEndianStringEncoding];
[array addObject:s];
}
}
}
}
For the uppercaseLetterCharacterSet
this gives an array of 1467 elements. But note that characters > U+FFFF are stored as UTF-16 surrogate pair in NSString
, so for example U+10400 actually is stored in NSString
as 2 characters "\uD801\uDC00".
Swift 2 code can be found in other answers to this question.
Here is a Swift 3 version, written as an extension method:
extension CharacterSet {
func allCharacters() -> [Character] {
var result: [Character] = []
for plane: UInt8 in 0...16 where self.hasMember(inPlane: plane) {
for unicode in UInt32(plane) << 16 ..< UInt32(plane + 1) << 16 {
if let uniChar = UnicodeScalar(unicode), self.contains(uniChar) {
result.append(Character(uniChar))
}
}
}
return result
}
}
Example:
let charset = CharacterSet.uppercaseLetters
let chars = charset.allCharacters()
print(chars.count) // 1521
print(chars) // ["A", "B", "C", ... "]
(Note that some characters may not be present in the font used to
display the result.)
NSCharacterSet.characterIsMember() with Swift's Character type
My understanding is that unichar
is a typealias for UInt16
. A unichar
is just a number.
I think that the problem that you are facing is that a Character
in Swift can be composed of more than one unicode "characters". Thus, it cannot be converted to a single unichar
value because it may be composed of two unichars. You can decompose a Character
into its individual unichar
values by casting it to a string and using the utf16
property, like this:
let c: Character = "a"
let s = String(c)
var codeUnits = [unichar]()
for codeUnit in s.utf16 {
codeUnits.append(codeUnit)
}
This will produce an array - codeUnits
- of unichar
values.
EDIT: Initial code had for codeUnit in s
when it should have been for codeUnit in s.utf16
You can tidy things up and test for whether or not each individual unichar
value is in a character set like this:
let char: Character = "\u{63}\u{20dd}" // This is a 'c' inside of an enclosing circle
for codeUnit in String(char).utf16 {
if NSCharacterSet(charactersInString: "c").characterIsMember(codeUnit) {
dude.abide()
} // dude will abide() for codeUnits[0] = "c", but not for codeUnits[1] = 0x20dd (the enclosing circle)
}
Or, if you are only interested in the first (and often only) unichar
value:
if NSCharacterSet(charactersInString: "c").characterIsMember(String(char).utf16[0]) {
dude.abide()
}
Or, wrap it in a function:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
return set.characterIsMember(String(char).utf16[0])
}
let xSet = NSCharacterSet(charactersInString: "x")
isChar("x", inSet: xSet) // This returns true
isChar("y", inSet: xSet) // This returns false
Now make the function check for all unichar
values in a composed character - that way, if you have a composed character, the function will only return true if both the base character and the combining character are present:
func isChar(char: Character, inSet set: NSCharacterSet) -> Bool {
var found = true
for ch in String(char).utf16 {
if !set.characterIsMember(ch) { found = false }
}
return found
}
let acuteA: Character = "\u{e1}" // An "a" with an accent
let acuteAComposed: Character = "\u{61}\u{301}" // Also an "a" with an accent
// A character set that includes both the composed and uncomposed unichar values
let charSet = NSCharacterSet(charactersInString: "\u{61}\u{301}\u{e1}")
isChar(acuteA, inSet: charSet) // returns true
isChar(acuteAComposed, inSet: charSet) // returns true (both unichar values were matched
The last version is important. If your Character
is a composed character you have to check for the presence of both the base character ("a") and the combining character (the acute accent) in the character set or you will get false positives.
Check if string contains special characters in Swift
Your code check if no character in the string is from the given set.
What you want is to check if any character is not in the given set:
if (searchTerm!.rangeOfCharacterFromSet(characterSet.invertedSet).location != NSNotFound){
println("Could not handle special characters")
}
You can also achieve this using regular expressions:
let regex = NSRegularExpression(pattern: ".*[^A-Za-z0-9].*", options: nil, error: nil)!
if regex.firstMatchInString(searchTerm!, options: nil, range: NSMakeRange(0, searchTerm!.length)) != nil {
println("could not handle special characters")
}
The pattern [^A-Za-z0-9]
matches a character which is not from the ranges A-Z,
a-z, or 0-9.
Update for Swift 2:
let searchTerm = "a+b"
let characterset = NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacterFromSet(characterset.invertedSet) != nil {
print("string contains special characters")
}
Update for Swift 3:
let characterset = CharacterSet(charactersIn: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacter(from: characterset.inverted) != nil {
print("string contains special characters")
}
How to create a custom NSCharacterSet?
The most typical way to create a new character set is usingCharacterSet(charactersIn:)
, giving a String
with all the characters of the set.
Adding some characters to an existing set can be achieved using:
let characterSet = NSMutableCharacterSet() //create an empty mutable set
characterSet.formUnionWithCharacterSet(NSCharacterSet.URLQueryAllowedCharacterSet())
characterSet.addCharactersInString("?&")
or in Swift 3+ simply:
var characterSet = CharacterSet.urlQueryAllowed
characterSet.insert(charactersIn: "?&")
For URL encoding, also note Objective-C and Swift URL encoding
Remove all non-numeric characters from a string in swift
I was hoping there would be something like stringFromCharactersInSet() which would allow me to specify only valid characters to keep.
You can either use trimmingCharacters
with the inverted
character set to remove characters from the start or the end of the string. In Swift 3 and later:
let result = string.trimmingCharacters(in: CharacterSet(charactersIn: "0123456789.").inverted)
Or, if you want to remove non-numeric characters anywhere in the string (not just the start or end), you can filter
the characters
, e.g. in Swift 4.2.1:
let result = string.filter("0123456789.".contains)
Or, if you want to remove characters from a CharacterSet from anywhere in the string, use:
let result = String(string.unicodeScalars.filter(CharacterSet.whitespaces.inverted.contains))
Or, if you want to only match valid strings of a certain format (e.g. ####.##
), you could use regular expression. For example:
if let range = string.range(of: #"\d+(\.\d*)?"#, options: .regularExpression) {
let result = string[range] // or `String(string[range])` if you need `String`
}
The behavior of these different approaches differ slightly so it just depends on precisely what you're trying to do. Include or exclude the decimal point if you want decimal numbers, or just integers. There are lots of ways to accomplish this.
For older, Swift 2 syntax, see previous revision of this answer.
Related Topics
Fetch Coredata with One to Many Relationship in Swift
How to Find the Time Interval Remaining from Nstimer
Simple Observable Struct with Rxswift
Expression Pattern of Type 'String' Cannot Match Values of Type 'Nsstoryboardsegue.Identifier
Instance Member Cannot Be Used on Type | Closures
Searchbar Problem While Trying to Search Firestore and Reload the Tableview
Obtain Nsurl from Uiimagepickercontroller
Apple Push Notifications Without Developer Account
Swiftui Nested Foreach Causes Unexpected Ordering
Round Currency Closest to Five
Differences Generic Protocol Type Parameter VS Direct Protocol Type
How to Set Width and Height of an Image in Swiftui
How to Display Current Time (Realtime) in iOS 14 Home Widget
JSONencoder and Propertylistencoder Don't Conform to Encoder