What is the correct length: argument to provide to NSRange for NSRegularExpression using a (Swift) String?
The utf16 count is correct, not the utf8 count. Or, best, use the convenience initializers, which convert a Range
of String.Index
to a NSRange
:
let range = NSRange(str.startIndex..., in: str)
And to convert NSRange
to String.Range
:
let range = Range(nsRange, in: str)
Thus, putting that together:
let str = "#tweak #wow #gaming"
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
let nsRange = NSRange(str.startIndex..., in: str)
let strings = regex.matches(in: str, range: nsRange).compactMap {
Range($0.range, in: str).map { str[$0] }
}
print(strings)
}
See WWDC 2017 Efficient Interactions with Frameworks, which talks about (a) our historical use of UTF16 when dealing with ranges; and (b) the fact that we don’t have to do that any more.
Which Swift character count should I use when interacting with NSString APIs?
TL;DR
The documentation for NSString.length specifies:
The number of UTF-16 code units in the receiver.
Thus, if you want to interop between String and NSString:
- You should use
string.utf16.count
, and it will match up perfectly with(string as NSString).length
.
If you want to count the number of visible characters:
You should use
string.count
, and it will match up to the same number of times you need the → (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.
Here's a Swift 4.0 playground to test a bunch of strings and functions:
let header = "NSString .utf16❔ encodedOffset❔ NSRange❔ .count❔ .characters❔ distance❔ .unicodeScalars❔ .utf8❔ Description"
var format = " %3d %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly
print(header)
test("")
test("abc")
test("❌")
test(")
test("☾test")
test(")
test("\u{200d}\u{200d}\u{200d})
test(")
test("\u{1F468}")
test("♀️♂️)
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")
func test(_ s: String, _ description: String? = nil) {
func icon(for length: Int) -> String {
return length == (s as NSString).length ? "✅" : "❌"
}
let description = description ?? "'" + s + "'"
let string = String(
format: format,
(s as NSString).length,
s.utf16.count, icon(for: s.utf16.count),
s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
s.count, icon(for: s.count),
s.characters.count, icon(for: s.characters.count),
s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
s.utf8.count, icon(for: s.utf8.count),
description)
print(string)
}
And here is the output:
NSString .utf16❔ encodedOffset❔ NSRange❔ .count❔ .characters❔ distance❔ .unicodeScalars❔ .utf8❔ Description
0 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ ''
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 'abc'
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 3 ❌ '❌'
4 4 ✅ 4 ✅ 4 ✅ 1 ❌ 1 ❌ 1 ❌ 2 ❌ 8 ❌ ''
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 7 ❌ '☾test'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ ''
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ ''
8 8 ✅ 8 ✅ 8 ✅ 4 ❌ 4 ❌ 4 ❌ 4 ❌ 16 ❌ ''
2 2 ✅ 2 ✅ 2 ✅ 1 ❌ 1 ❌ 1 ❌ 1 ❌ 4 ❌ ''
58 58 ✅ 58 ✅ 58 ✅ 13 ❌ 13 ❌ 13 ❌ 32 ❌ 122 ❌ '♀️♂️'
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 9 ❌ '你好吗'
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 10 ❌ Arabic word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Arabic letter
4 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 8 ❌ Hebrew word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Hebrew letter
Conclusions:
- To get a length that is compatible with NSString/NSRange, use either
(s as NSString).length
,s.utf16.count
(preferred),s.endIndex.encodedOffset
, orNSRange(s.startIndex..<s.endIndex, in: s)
. - To get the number of visible characters, use either
s.count
(preferred),s.characters.count
(deprecated), ors.distance(from: s.startIndex, to: s.endIndex)
A helpful extension to get the full range of a String:
public extension String {
var nsrange: NSRange {
return NSRange(startIndex..<endIndex, in: self)
}
}
Thus, you can call the original method like so:
replace(", characterAtIndex: ".utf16.count - 1) // �!
Swift regex for searching format specifiers in a string
First of all you have to replace the matches reversed otherwise you will run into index trouble.
A possible pattern is
%([.0-9]+)?[@df]
it considers also the (optional) decimal places specifier.
var description = "I am %@. My age is %d and my height is %.02f. (%@)"
let pattern = "%([.0-9]+)?[@df]"
let regex = try NSRegularExpression(pattern: pattern)
let nsrange = NSRange(description.startIndex..., in: description)
for match in regex.matches(in: description, range: nsrange).reversed() {
let range = Range(match.range, in: description)!
description.replaceSubrange(range, with: "MATCH")
}
print(description)
Which NSRegularExpression was found using the | operator
If you use the combined pattern you have the results in different range of the match result.
If you want to access the first capture group (the bold pattern) you need to access the range at 1. When the match matches the second group you will have the first with an invalid range, so you need to check if it's valid of not this way:
results.forEach {
var range = $0.range(at: 1)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .bold)
}
range = $0.range(at: 2)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .italic)
}
}
After that you can extend your TypeAttributes
enum to return the index range that is linked to your regular expression:
extension NSRange {
func isValid(for string:String) -> Bool {
return location + length < string.count
}
}
let attributes: [TypeAttributes] = [.bold, .italic]
results.forEach { match in
attributes.enumerated().forEach { index, attribute in
let range = match.range(at: index+1)
if range.isValid(for: str) {
self.applyAttributes(range: range, type: attribute[index])
}
}
}
NSRegularExpression to capture portions of invalid JSON
NSString *string = @"{\"code\":5,\"id\":104,\"message\":\"Not working\"}{\"code\":5,\"id\":101,\"message\":\"some message\"}{\"code\":5,\"id\":105,\"message\":\"test\"}";
NSLog(@"String: %@", string);
NSMutableArray *allSubDictAsStr = [[NSMutableArray alloc] init];
NSString *pattern = @"\\{.*?\\}";
NSError *errorRegex = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&errorRegex];
NSArray *results = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *aResult in results)
{
NSString *subJSONStr = [string substringWithRange:[aResult range]];
[allSubDictAsStr addObject:subJSONStr];
}
NSString *bigJSONStr = [NSString stringWithFormat:@"[%@]", [allSubDictAsStr componentsJoinedByString:@","]];
NSError *errorJSON = nil;
NSArray *jsonArray = [NSJSONSerialization JSONObjectWithData:[bigJSONStr dataUsingEncoding:NSUTF8StringEncoding] options:0 error:&errorJSON];
NSLog(@"JsonArray: %@", jsonArray);
That's a possible solution, it's not really clean, but I got to play with Regex. Your issue could be more on your WebSocket parsing.
Idea :
• Use a Regex to isolate each Dictionary JSON.
• Construct then an "array of dictionaries JSON" (bigJSONStr
, as NSString
in our case).
For the pattern, note that you have to escape {
and }
because they are reserved in Regular Expression.
I didn't check the NSError
parameters, which is of course not recommended.
EDIT:
Additional note/modification: Rather than constructing "bigJSONStr" (which is quite ugly)
NSMutableArray *allResponses = [[NSMutableArray alloc] init];
...
for (NSTextCheckingResult *aResult in results)
{
NSString *subJSONStr = [string substringWithRange:[aResult range]];
NSError *errorJSON = nil;
NSDictionary *aResponseDict = [NSJSONSerialization JSONObjectWithData:[subJSONStr dataUsingEncoding:NSUTF8StringEncoding] options:0 error:&errorJSON];
if (!errorJSON) [allResponses addObject:subJSONStr];
}
NSLog(@"allResponses: %@", allResponses);
NSRange to Range String.Index
The NSString
version (as opposed to Swift String) of replacingCharacters(in: NSRange, with: NSString)
accepts an NSRange
, so one simple solution is to convert String
to NSString
first. The delegate and replacement method names are slightly different in Swift 3 and 2, so depending on which Swift you're using:
Swift 3.0
func textField(_ textField: UITextField,
shouldChangeCharactersIn range: NSRange,
replacementString string: String) -> Bool {
let nsString = textField.text as NSString?
let newString = nsString?.replacingCharacters(in: range, with: string)
}
Swift 2.x
func textField(textField: UITextField,
shouldChangeCharactersInRange range: NSRange,
replacementString string: String) -> Bool {
let nsString = textField.text as NSString?
let newString = nsString?.stringByReplacingCharactersInRange(range, withString: string)
}
Regex not working in swift. Giving an error invalid regex
Curly braces are special characters which have to be escaped
\{[^}]*\}
, in a Swift literal string \\{[^}]*\\}
By the way don't use the literal initializer of NSRange
to get the length of the string, the highly recommended way is
static func matches(regex: String, text: String) -> Bool {
do {
let regex = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
let match = regex.firstMatch(in: text, options: [],
range: NSRange(text.startIndex..., in: text)
return match != nil
} catch {
print("invalid regex: \(error.localizedDescription)")
return false
}
}
How can I use NSRegularExpression on Swift strings with variable-width Unicode characters?
Turns out you can fight fire with fire. Using the Swift-native string's utf16Count
property and the substringWithRange:
method of NSString
-- not String
-- gets the right result. Here's the full working code:
let str = "dogcow"
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!
if let cowMatch = cowRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: str.utf16Count)) {
println((str as NSString).substringWithRange(cowMatch.range))
// prints "cow"
}
(I figured this out in the process of writing the question; score one for rubber duck debugging.)
Related Topics
How to Load a Url Link That Is Inside a Web View and Keep It in That Web View in Swift
Rotating a View in Layoutsubviews
Uiimageview Missing Images in Launch Screen on Device
How to Remove an iOS App from the App Store
Coredata Get Distinct Values of Attribute
Automatic Otp Verification in iOS
How to Add Iphonex Launch Image
Is There a Swiftui Equivalent for Viewwilldisappear(_:) or Detect When a View Is About to Be Removed
Uivisualeffectview with Mask Layer
Upload Files to Dropbox from iOS App with Swift
How to Know That If the Only Visible Area of a .Png Is Touched in Xcode
Autolayout - Intrinsic Size of Uibutton Does Not Include Title Insets
iOS 6 - Viewdidunload Migrate to Didreceivememorywarning
When Does a Uitableview's Contentsize Get Set
How to Load Local PDF in Uiwebview in Swift
Running Individual Xctest (Ui, Unit) Test Cases for iOS Apps from the Command Line