Swift 3 - Search Result Also with Diacritics

swift 3 - search result also with diacritics

You can use range(of:, options:) with options for a diacritic insensitive
(and optionally case insensitive) search. Example:

let list = ["holesovice", "holešovice"]
let searchTerm = "sovi"

let filtered = list.filter {
$0.range(of: searchTerm, options: [.diacriticInsensitive, .caseInsensitive]) != nil
}

print(filtered) // ["holesovice", "holešovice"]

Search arabic text while ignoring diacritics or accents

NSStringCompareOptions also has a DiacriticInsensitiveSearch you can use (in the same way as the case insensitive).

Diacritic Sensitive sort for objects fetched from core data?

There're two approaches I can think of:

  1. Therefore should I sort the results after fetching them from core data? - that's a one way of doing it. Apple doesn't provide exact string sorting complexities, but I think the bigger problem is the need to first fetch all objects from the persistent store. If you have a lot of data it can hinder the performance. It's best to profile it and only then decide if the performance is acceptable.

  2. You can try to use NSString methods which are translated into SQL: localizedStandardCompare:, localizedCompare: or localizedCaseInsensitiveCompare:. A sort descriptor using any of these methods can be created in the following way:

    sortDescriptor = [NSSortDescriptor sortDescriptorWithKey:@"sortTitle"
    ascending:YES
    selector:@selector(localizedCaseInsensitiveCompare:)];

    If none of these methods sorts the data the way you want, you can also preprocess it beforehand, e.g. when the title changes you remove the diacritics etc. (see Normalize User-Generated Content at NSHipster - CFStringTransform). UPDATE: Let me assume the title attribute is named title and the title for sorting is named sortTitle. In a subclass of NSManagedObject you can override didChangeValueForKey:as follows:

    - (void)didChangeValueForKey:(NSString *)key
    {
    [super didChangeValueForKey:key];

    if ([key isEqualToString:@"title"]) {
    NSString *cleanTitle = [self.title mutableCopy];
    CFStringTransform((__bridge CFMutableStringRef)(cleanTitle), NULL, kCFStringTransformStripCombiningMarks, NO);
    self.sortTitle = [cleanTitle copy];
    }
    }

How to remove diacritics from a String in Swift?

This can also be done applying a StringTransform:

let foo = "één"
let bar = foo.applyingTransform(.stripDiacritics, reverse: false)!
print(bar) // een

Or implementing a custom property to StringProtocol

extension StringProtocol {
var stripingDiacritics: String {
applyingTransform(.stripDiacritics, reverse: false)!
}
}


let bar = foo.stripingDiacritics
print(bar) // een

swift string diacriticInsensitive not working correct

This precisely matches the meaning of diacriticInsensitive. UTR #30 covers this. "Diacritic removal" includes "stroke, hook, descender" and all other "diacritics" returning the "related base character." While in Swedish å is considered a separate letter for sorting purposes, it still has a "base character" of (Latin) a. (Similarly for ä and ö.) This is a complex problem in Swedish, but the results should not be surprising.

The ultimate rules are in Unicode's DiacriticFolding. These rules are not locale specific. It's possible that Foundation applies some additional locale rules, but clearly not in this case. The relevant Unicode folding rule is:

0061 030A;  0061    # å → a LATIN SMALL LETTER A, COMBINING RING ABOVE → LATIN SMALL LETTER A

Many cultures have subtle definitions of what is "a letter" vs "an extension of another letter" vs "a half-letter" vs "a non-letter symbol." When computing diacritics, the Turkish "İ" has a base form of "I", but "i" does not have a base form of "ı". That's bizarre, but true, because it's treating "basic latin" as the base alphabet. ("Basic Latin" is itself a bizarre classification, with letters j, u, and w being somewhat modern additions. But still we call it "Latin.")

Unicode tries to "thread the needle" on these complex issues, with varying success. It tends to be biased towards Romance languages (and particularly Western European countries). But it does try. And it has a focus on what users will expect. So should a search for "halla" find "Hallå." I'm betting that most Swedes would consider that "close enough."

Keyboards are designed to be useful to the cultures they're created for, so whether a particular symbol appears on the keyboard shouldn't be assumed to be making any strong claim about how the alphabet works. The iOS Arabic keyboard includes the half-letter "ء". That isn't making a claim about how the alphabet works. It's just saying that ء is somewhat commonly typed when writing Arabic.

FMDB: SQLite Statement ORDER BY orders diacritics incorrectly

You can define your own SQLite function that uses CFStringTransform to remove the accents. Using FMDB 2.7:

db.makeFunctionNamed("unaccented", arguments: 1) { context, argc, argv in
guard db.valueType(argv[0]) == .text || db.valueType(argv[0]) == .null else {
db.resultError("Expected string parameter", context: context)
return
}

if let string = db.valueString(argv[0])?.folding(options: .diacriticInsensitive, locale: nil) {
db.resultString(string, context: context)
} else {
db.resultNull(context: context)
}
}

You can then use this new unaccented function in your SQL:

do {
try db.executeQuery("SELECT * FROM spesenValues ORDER BY unaccented(country) ASC" values: nil)

while rs.next() {
// do what you want with results
}

rs.close()
} else {
NSLog("executeQuery error: %@", db.lastErrorMessage())
}

You suggest that you want to replace "ä", "ö", and "ü" with "ae", "oe", and "ue", respectively. This is generally only done with proper names and geographical names (see Wikipedia's entry for German orthography), but if you wanted to do that, have your custom function (which I've renamed "sortstring") replace these values as appropriate:

db.makeFunctionNamed("sortstring", arguments: 1) { context, argc, argv in
guard argc == 1 && (db.valueType(argv[0]) == .text || db.valueType(argv[0]) == .null) else {
db.resultError("Expected string parameter", context: context)
return
}

let replacements = ["ä": "ae", "ö": "oe", "ü": "ue", "ß": "ss"]

var string = db.valueString(argv[0])!.lowercased()

for (searchString, replacement) in replacements {
string = string.replacingOccurrences(of: searchString, with: replacement)
}

db.resultString(string.folding(options: .diacriticInsensitive, locale: nil), context: context)
}

By the way, since you're using this just for sorting, you probably want to convert this to lowercase, too, so that the upper case values are not separated from the lower case values.

But the idea is the same, define whatever function you want for sorting, and then you can use FMDB's makeFunctionNamed to make it available in SQLite.

Bounding box of character with diacritics using CoreText

In the first example, you seem to ignore the fact that the bounding rect for glyphs has most probably a negative y origin. The returned rect usually treats y=0 as the baseline for text. You thus set an offset in bounds rect and that is probably also the reason the layer has an offset in the text. (didn't try but think so)

If you're not interested in the bounds of a specific text but choosing a height that encloses all kinds of text, you might also want to go for CTFontGetBoundingBox.

Check if string contains special characters in Swift

Your code check if no character in the string is from the given set.
What you want is to check if any character is not in the given set:

if (searchTerm!.rangeOfCharacterFromSet(characterSet.invertedSet).location != NSNotFound){
println("Could not handle special characters")
}

You can also achieve this using regular expressions:

let regex = NSRegularExpression(pattern: ".*[^A-Za-z0-9].*", options: nil, error: nil)!
if regex.firstMatchInString(searchTerm!, options: nil, range: NSMakeRange(0, searchTerm!.length)) != nil {
println("could not handle special characters")

}

The pattern [^A-Za-z0-9] matches a character which is not from the ranges A-Z,
a-z, or 0-9.

Update for Swift 2:

let searchTerm = "a+b"

let characterset = NSCharacterSet(charactersInString: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacterFromSet(characterset.invertedSet) != nil {
print("string contains special characters")
}

Update for Swift 3:

let characterset = CharacterSet(charactersIn: "abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ0123456789")
if searchTerm.rangeOfCharacter(from: characterset.inverted) != nil {
print("string contains special characters")
}


Related Topics



Leave a reply



Submit