What Is the Correct Length: Argument to Provide to Nsrange for Nsregularexpression Using a (Swift) String

What is the correct length: argument to provide to NSRange for NSRegularExpression using a (Swift) String?

The utf16 count is correct, not the utf8 count. Or, best, use the convenience initializers, which convert a Range of String.Index to a NSRange:

let range = NSRange(str.startIndex..., in: str)

And to convert NSRange to String.Range:

let range = Range(nsRange, in: str)

Thus, putting that together:

let str = "#tweak #wow #gaming" 
if let regex = try? NSRegularExpression(pattern: "#[a-z0-9]+", options: .caseInsensitive) {
let nsRange = NSRange(str.startIndex..., in: str)
let strings = regex.matches(in: str, range: nsRange).compactMap {
Range($0.range, in: str).map { str[$0] }
}
print(strings)
}

See WWDC 2017 Efficient Interactions with Frameworks, which talks about (a) our historical use of UTF16 when dealing with ranges; and (b) the fact that we don’t have to do that any more.

Which Swift character count should I use when interacting with NSString APIs?

TL;DR

The documentation for NSString.length specifies:

The number of UTF-16 code units in the receiver.

Thus, if you want to interop between String and NSString:

  • You should use string.utf16.count, and it will match up perfectly with (string as NSString).length.

If you want to count the number of visible characters:

  • You should use string.count, and it will match up to the same number of times you need the (right) key on your keyboard until you get to the end of the string (assuming you start at the beginning).

    Note: This is not always 100% accurate, but it appears Apple is constantly improving the implementation to make it more and more accurate.


Here's a Swift 4.0 playground to test a bunch of strings and functions:

let header = "NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description"
var format = " %3d %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %3d ❓ %@"
format = format.replacingOccurrences(of: "❓", with: "%@") // "❓" acts as a placeholder for "%@" to align the text perfectly

print(header)

test("")
test("abc")
test("❌")
test(")
test("☾test")
test("‍‍‍)
test("\u{200d}\u{200d}\u{200d})
test(")
test("\u{1F468}")
test("‍♀️‍♂️)
test("你好吗")
test("مرحبا", "Arabic word")
test("م", "Arabic letter")
test("שלום", "Hebrew word")
test("ם", "Hebrew letter")

func test(_ s: String, _ description: String? = nil) {
func icon(for length: Int) -> String {
return length == (s as NSString).length ? "✅" : "❌"
}

let description = description ?? "'" + s + "'"
let string = String(
format: format,
(s as NSString).length,
s.utf16.count, icon(for: s.utf16.count),
s.endIndex.encodedOffset, icon(for: s.endIndex.encodedOffset),
NSRange(s.startIndex..<s.endIndex, in: s).upperBound, icon(for: NSRange(s.startIndex..<s.endIndex, in: s).upperBound),
s.count, icon(for: s.count),
s.characters.count, icon(for: s.characters.count),
s.distance(from: s.startIndex, to: s.endIndex), icon(for: s.distance(from: s.startIndex, to: s.endIndex)),
s.unicodeScalars.count, icon(for: s.unicodeScalars.count),
s.utf8.count, icon(for: s.utf8.count),
description)
print(string)
}

And here is the output:

NSString   .utf16❔   encodedOffset❔   NSRange❔   .count❔   .characters❔   distance❔   .unicodeScalars❔   .utf8❔   Description
0 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ 0 ✅ ''
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 'abc'
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 3 ❌ '❌'
4 4 ✅ 4 ✅ 4 ✅ 1 ❌ 1 ❌ 1 ❌ 2 ❌ 8 ❌ ''
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 7 ❌ '☾test'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ '‍‍‍'
11 11 ✅ 11 ✅ 11 ✅ 1 ❌ 1 ❌ 1 ❌ 7 ❌ 25 ❌ '‍‍‍'
8 8 ✅ 8 ✅ 8 ✅ 4 ❌ 4 ❌ 4 ❌ 4 ❌ 16 ❌ ''
2 2 ✅ 2 ✅ 2 ✅ 1 ❌ 1 ❌ 1 ❌ 1 ❌ 4 ❌ ''
58 58 ✅ 58 ✅ 58 ✅ 13 ❌ 13 ❌ 13 ❌ 32 ❌ 122 ❌ '‍♀️‍♂️'
3 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 3 ✅ 9 ❌ '你好吗'
5 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 5 ✅ 10 ❌ Arabic word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Arabic letter
4 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 4 ✅ 8 ❌ Hebrew word
1 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 1 ✅ 2 ❌ Hebrew letter

Conclusions:

  • To get a length that is compatible with NSString/NSRange, use either (s as NSString).length, s.utf16.count (preferred), s.endIndex.encodedOffset, or NSRange(s.startIndex..<s.endIndex, in: s).
  • To get the number of visible characters, use either s.count (preferred), s.characters.count (deprecated), or s.distance(from: s.startIndex, to: s.endIndex)

A helpful extension to get the full range of a String:

public extension String {

var nsrange: NSRange {
return NSRange(startIndex..<endIndex, in: self)
}
}

Thus, you can call the original method like so:

replace("‍‍‍, characterAtIndex: "‍‍‍.utf16.count - 1) // ‍‍‍�!

Swift regex for searching format specifiers in a string

First of all you have to replace the matches reversed otherwise you will run into index trouble.

A possible pattern is

%([.0-9]+)?[@df]

it considers also the (optional) decimal places specifier.

var description = "I am %@. My age is %d and my height is %.02f. (%@)"
let pattern = "%([.0-9]+)?[@df]"
let regex = try NSRegularExpression(pattern: pattern)
let nsrange = NSRange(description.startIndex..., in: description)

for match in regex.matches(in: description, range: nsrange).reversed() {
let range = Range(match.range, in: description)!
description.replaceSubrange(range, with: "MATCH")
}
print(description)

Which NSRegularExpression was found using the | operator

If you use the combined pattern you have the results in different range of the match result.

If you want to access the first capture group (the bold pattern) you need to access the range at 1. When the match matches the second group you will have the first with an invalid range, so you need to check if it's valid of not this way:

results.forEach {
var range = $0.range(at: 1)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .bold)
}
range = $0.range(at: 2)
if range.location + range.length < str.count {
self.applyAttributes(range: range, type: .italic)
}
}

After that you can extend your TypeAttributes enum to return the index range that is linked to your regular expression:

extension NSRange {
func isValid(for string:String) -> Bool {
return location + length < string.count
}
}

let attributes: [TypeAttributes] = [.bold, .italic]

results.forEach { match in
attributes.enumerated().forEach { index, attribute in
let range = match.range(at: index+1)
if range.isValid(for: str) {
self.applyAttributes(range: range, type: attribute[index])
}
}
}

NSRegularExpression to capture portions of invalid JSON

  NSString *string = @"{\"code\":5,\"id\":104,\"message\":\"Not working\"}{\"code\":5,\"id\":101,\"message\":\"some message\"}{\"code\":5,\"id\":105,\"message\":\"test\"}";

NSLog(@"String: %@", string);

NSMutableArray *allSubDictAsStr = [[NSMutableArray alloc] init];

NSString *pattern = @"\\{.*?\\}";
NSError *errorRegex = nil;
NSRegularExpression *regex = [NSRegularExpression regularExpressionWithPattern:pattern options:0 error:&errorRegex];
NSArray *results = [regex matchesInString:string options:0 range:NSMakeRange(0, [string length])];
for (NSTextCheckingResult *aResult in results)
{
NSString *subJSONStr = [string substringWithRange:[aResult range]];
[allSubDictAsStr addObject:subJSONStr];
}
NSString *bigJSONStr = [NSString stringWithFormat:@"[%@]", [allSubDictAsStr componentsJoinedByString:@","]];

NSError *errorJSON = nil;
NSArray *jsonArray = [NSJSONSerialization JSONObjectWithData:[bigJSONStr dataUsingEncoding:NSUTF8StringEncoding] options:0 error:&errorJSON];
NSLog(@"JsonArray: %@", jsonArray);

That's a possible solution, it's not really clean, but I got to play with Regex. Your issue could be more on your WebSocket parsing.

Idea :

• Use a Regex to isolate each Dictionary JSON.

• Construct then an "array of dictionaries JSON" (bigJSONStr, as NSString in our case).

For the pattern, note that you have to escape { and } because they are reserved in Regular Expression.
I didn't check the NSError parameters, which is of course not recommended.

EDIT:
Additional note/modification: Rather than constructing "bigJSONStr" (which is quite ugly)

NSMutableArray *allResponses = [[NSMutableArray alloc] init];
...
for (NSTextCheckingResult *aResult in results)
{
NSString *subJSONStr = [string substringWithRange:[aResult range]];
NSError *errorJSON = nil;
NSDictionary *aResponseDict = [NSJSONSerialization JSONObjectWithData:[subJSONStr dataUsingEncoding:NSUTF8StringEncoding] options:0 error:&errorJSON];
if (!errorJSON) [allResponses addObject:subJSONStr];
}
NSLog(@"allResponses: %@", allResponses);

NSRange to Range String.Index

The NSString version (as opposed to Swift String) of replacingCharacters(in: NSRange, with: NSString) accepts an NSRange, so one simple solution is to convert String to NSString first. The delegate and replacement method names are slightly different in Swift 3 and 2, so depending on which Swift you're using:

Swift 3.0

func textField(_ textField: UITextField,
shouldChangeCharactersIn range: NSRange,
replacementString string: String) -> Bool {

let nsString = textField.text as NSString?
let newString = nsString?.replacingCharacters(in: range, with: string)
}

Swift 2.x

func textField(textField: UITextField,
shouldChangeCharactersInRange range: NSRange,
replacementString string: String) -> Bool {

let nsString = textField.text as NSString?
let newString = nsString?.stringByReplacingCharactersInRange(range, withString: string)
}

Regex not working in swift. Giving an error invalid regex

Curly braces are special characters which have to be escaped

\{[^}]*\}, in a Swift literal string \\{[^}]*\\}

By the way don't use the literal initializer of NSRange to get the length of the string, the highly recommended way is

static func matches(regex: String, text: String) -> Bool {
do {
let regex = try NSRegularExpression(pattern: regex, options: .caseInsensitive)
let match = regex.firstMatch(in: text, options: [],
range: NSRange(text.startIndex..., in: text)
return match != nil
} catch {
print("invalid regex: \(error.localizedDescription)")
return false
}
}

How can I use NSRegularExpression on Swift strings with variable-width Unicode characters?

Turns out you can fight fire with fire. Using the Swift-native string's utf16Count property and the substringWithRange: method of NSString -- not String -- gets the right result. Here's the full working code:

let str = "dogcow"
let cowRegex = NSRegularExpression(pattern: "c.w", options: nil, error: nil)!

if let cowMatch = cowRegex.firstMatchInString(str, options: nil,
range: NSRange(location: 0, length: str.utf16Count)) {
println((str as NSString).substringWithRange(cowMatch.range))
// prints "cow"
}

(I figured this out in the process of writing the question; score one for rubber duck debugging.)



Related Topics



Leave a reply



Submit