How to Remove Diacritics from a String in Swift

How to remove diacritics from a String in Swift?

This can also be done applying a StringTransform:

let foo = "één"
let bar = foo.applyingTransform(.stripDiacritics, reverse: false)!
print(bar) // een

Or implementing a custom property to StringProtocol

extension StringProtocol {
var stripingDiacritics: String {
applyingTransform(.stripDiacritics, reverse: false)!
}
}


let bar = foo.stripingDiacritics
print(bar) // een

swift remove diacritic from Arabic

You can use Regex, try this code

 let myString = "الْحَمْدُ لِلَّهِ رَبِّ الْعَالَمِينَ"
let regex = try! NSRegularExpression(pattern: "[\\u064b-\\u064f\\u0650-\\u0652]", options: NSRegularExpression.Options.caseInsensitive)
let range = NSMakeRange(0, myString.unicodeScalars.count)
let modString = regex.stringByReplacingMatches(in: myString, options: [], range: range, withTemplate: "")
print(modString)

Output : الحمد لله رب العالمين

Replace Accent character with basic in a String - ą - a , ć - c

Try range(of with caseInsensitive and diacriticInsensitive options

let arr = dataSetArray.filter{ $0.localizedStandardRange(of: searchText) != nil }

without the extensions

swift string diacriticInsensitive not working correct

This precisely matches the meaning of diacriticInsensitive. UTR #30 covers this. "Diacritic removal" includes "stroke, hook, descender" and all other "diacritics" returning the "related base character." While in Swedish å is considered a separate letter for sorting purposes, it still has a "base character" of (Latin) a. (Similarly for ä and ö.) This is a complex problem in Swedish, but the results should not be surprising.

The ultimate rules are in Unicode's DiacriticFolding. These rules are not locale specific. It's possible that Foundation applies some additional locale rules, but clearly not in this case. The relevant Unicode folding rule is:

0061 030A;  0061    # å → a LATIN SMALL LETTER A, COMBINING RING ABOVE → LATIN SMALL LETTER A

Many cultures have subtle definitions of what is "a letter" vs "an extension of another letter" vs "a half-letter" vs "a non-letter symbol." When computing diacritics, the Turkish "İ" has a base form of "I", but "i" does not have a base form of "ı". That's bizarre, but true, because it's treating "basic latin" as the base alphabet. ("Basic Latin" is itself a bizarre classification, with letters j, u, and w being somewhat modern additions. But still we call it "Latin.")

Unicode tries to "thread the needle" on these complex issues, with varying success. It tends to be biased towards Romance languages (and particularly Western European countries). But it does try. And it has a focus on what users will expect. So should a search for "halla" find "Hallå." I'm betting that most Swedes would consider that "close enough."

Keyboards are designed to be useful to the cultures they're created for, so whether a particular symbol appears on the keyboard shouldn't be assumed to be making any strong claim about how the alphabet works. The iOS Arabic keyboard includes the half-letter "ء". That isn't making a claim about how the alphabet works. It's just saying that ء is somewhat commonly typed when writing Arabic.

Is there a way to convert special characters to normal characters in Swift?

Use stringByFoldingWithOptions and pass .DiacriticInsensitiveSearch

let s = "éáüîāṁṇār̥"
let r = s.stringByFoldingWithOptions(.DiacriticInsensitiveSearch, locale: nil)
print(r) // prints eauiamnar

NSString : easy way to remove UTF-8 accents from a string?

NSString *str = @"Être ou ne pas être. C'était là-bas.";
NSData *data = [str dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES];
NSString *newStr = [[NSString alloc] initWithData:data encoding:NSASCIIStringEncoding];
NSLog(@"%@", newStr);

... or try using NSUTF8StringEncoding instead.

List of encoding types here:

https://developer.apple.com/documentation/foundation/nsstringencoding


Just FTR here's a one line way to write this great answer:

yourString = [[NSString alloc]
initWithData:
[yourString dataUsingEncoding:NSASCIIStringEncoding allowLossyConversion:YES]
encoding:NSASCIIStringEncoding];

How to find accents character from string in ios?

You can create a custom CharacterSet that contains all the characters you need, and then search your target string for occurrences of characters from that set

let str = "Hello, plåÅyground"
let characters = "åÅ" // put all required characters in that string
let characterSet = CharacterSet(charactersIn: characters)
if let _ = str.rangeOfCharacter(from: characterSet) {
// do stuff here
}

If you want to differentiate between occurrences of uppercase/lowercase, create separate character sets for uppercase/lowercase characters

let str = "Hello, plåÅyground"
let uppercaseCharacters = "Å"
let lowercaseCharacters = "å"
let uppercaseCharacterSet = CharacterSet(charactersIn: uppercaseCharacters)
let lowercaseCharacterSet = CharacterSet(charactersIn: lowercaseCharacters)
if let _ = str.rangeOfCharacter(from: uppercaseCharacterSet) {
// uppercase character found
} else if let _ = str.rangeOfCharacter(from: lowercaseCharacterSet) {
// lowercase character found
}

How to remove all values within square brackets, including the brackets | Swift

you could try something simple like this:

var sampleString = "[2049A30-3930Q4] The Rest of the String"

sampleString.removeSubrange(sampleString.startIndex..."[2049A30-3930Q4]".endIndex)
// this will also work
// sampleString.removeSubrange(sampleString.startIndex..."[0000000-000000]".endIndex)
print("----> sampleString: \(sampleString)")


EDIT-1: more general approach if needed.

if let from = sampleString.range(of: "[")?.lowerBound,
let to = sampleString.range(of: "]")?.upperBound {
sampleString.removeSubrange(from...to)
print("----> sampleString: \(sampleString)")
}


Related Topics



Leave a reply



Submit