Java change áéőűú to aeouu
I think your question is the same as these:
- Java - getting rid of accents and converting them to regular letters
- Converting Java String to ascii
and hence the answer is also the same:
String convertedString =
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
See
- JavaDoc: Normalizer.normalize(String, Normalizer.Form)
- JavaDoc: Normalizer.Form.NFD
- Sun Java Tutorial: Normalizer's API)
Example Code:
final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
);
Output:
This is a funky String
Android/Java change áéőűú to aeouu
String.replaceAll(String, String) ?
Is there a way to get rid of accents and convert a whole string to regular letters?
Start with java.text.Normalizer
.
string = Normalizer.normalize(string, Normalizer.Form.NFD);
// or Normalizer.Form.NFKD for a more "compatible" deconstruction
This will separate all of the accent marks from most characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.
string = string.replaceAll("[^\\p{ASCII}]", "");
If your text is in Unicode, you should use this instead:
string = string.replaceAll("\\p{M}", "");
For Unicode, \\P{M}
matches the base glyph and \\p{M}
(lowercase) matches each accent.
Thanks to GarretWilson for the pointer and regular-expressions.info for the great Unicode guide.
It is important to note that Normalizer
by itself is insufficient to remove diacritics. For example, the following will not replace the accented é
with the unaccented e
:
import static java.text.Normalizer.normalize;
import static java.text.Normalizer.Form.*;
public class T {
public static void main( final String[] args ) {
final var text = "Brévis";
System.out.println(
normalize( text, NFD ) + " " +
normalize( text, NFC ) + " " +
normalize( text, NFKD ) + " " +
normalize( text, NFKC )
);
}
}
Converting Java String to ascii
I think your question is the same as this one:
Java - getting rid of accents and converting them to regular letters
and hence the answer is also the same:
Solution
String convertedString =
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");
References
See
- JavaDoc: Normalizer.normalize(String, Normalizer.Form)
- JavaDoc: Normalizer.Form.NFD
- Sun Java Tutorial: Normalizer's API
Example Code:
final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
);
Output:
This is a funky String
How to remove special char from spanish word
Use StringUtils.stripAccents
of Apache Commons:
String output = StringUtils.stripAccents("VIRÚ");
System.out.println(output); // VIRU
Maven dependency:
<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.6</version>
</dependency>
How to get an alphanumeric String from any string in Java?
You can use the java.text.Normalizer
class to convert your text into normal Latin characters followed by diacritic marks (accents), where possible. So for example, the single-character string "é"
would become the two character string ['e', {COMBINING ACUTE ACCENT}]
.
After you've done this, your String would be a combination of unaccented characters, accent modifiers, and the other special characters you've mentioned. At this point you could filter the characters in your string using only a whitelist to keep what you want (which could be as simple as [A-Za-z0-9]
for a regex, depending on what you're after).
An approach might look like:
String name ="I>télé"; //example
String normalized = Normalizer.normalize(name, Form.NFD);
String result = normalized.replaceAll("[^A-Za-z0-9]", "");
What is the better approach to trim unprintable characters from a string
Edited
use Normalizer (since java 6)
public static final Pattern DIACRITICS_AND_FRIENDS
= Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");
private static String stripDiacritics(String str) {
str = Normalizer.normalize(str, Normalizer.Form.NFD);
str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll("");
return str;
}
And here and here are complete solution.
And if you only want to remove all non printable characters from a string, use
rawString.replaceAll("[^\\x20-\\x7e]", "")
Ref : replace special characters in string in java and How to remove high-ASCII characters from string like ®, ©, ™ in Java
Related Topics
Why Integer.Max_Value + 1 == Integer.Min_Value
Read Rsa Private Key of Format Pkcs1 in Java
Jtable + Sorting Specific Field
Reading and Displaying Data from a .Txt File
What's the Best Way to Implement 'Next' and 'Previous' on an Enum Type
Pdfbox - Signature Validity Checkmark Not Visible in Acrobat Reader
Check If Int Is Between Two Numbers
How to Handle It with Scanner (Java)
Binding a List in @Requestparam
How to Find Repeated Characters with a Regex in Java
Automatic Reserved Word Escaping for Hibernate Tables and Columns
Reverse Java Graphics2D Scaled and Rotated Coordinates
How Does the Java Array Argument Declaration Syntax "..." Work
CSV File with "Id" as First Item Is Corrupt in Excel