Java Change áéŐűú to Aeouu

Java change áéőűú to aeouu

I think your question is the same as these:

  • Java - getting rid of accents and converting them to regular letters
  • Converting Java String to ascii

and hence the answer is also the same:

String convertedString = 
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");

See

  • JavaDoc: Normalizer.normalize(String, Normalizer.Form)
  • JavaDoc: Normalizer.Form.NFD
  • Sun Java Tutorial: Normalizer's API)

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
);

Output:

This is a funky String

Android/Java change áéőűú to aeouu

String.replaceAll(String, String) ?

Is there a way to get rid of accents and convert a whole string to regular letters?

Start with java.text.Normalizer.

string = Normalizer.normalize(string, Normalizer.Form.NFD);
// or Normalizer.Form.NFKD for a more "compatible" deconstruction

This will separate all of the accent marks from most characters. Then, you just need to compare each character against being a letter and throw out the ones that aren't.

string = string.replaceAll("[^\\p{ASCII}]", "");

If your text is in Unicode, you should use this instead:

string = string.replaceAll("\\p{M}", "");

For Unicode, \\P{M} matches the base glyph and \\p{M} (lowercase) matches each accent.

Thanks to GarretWilson for the pointer and regular-expressions.info for the great Unicode guide.


It is important to note that Normalizer by itself is insufficient to remove diacritics. For example, the following will not replace the accented with the unaccented e:

import static java.text.Normalizer.normalize;
import static java.text.Normalizer.Form.*;

public class T {
public static void main( final String[] args ) {
final var text = "Brévis";

System.out.println(
normalize( text, NFD ) + " " +
normalize( text, NFC ) + " " +
normalize( text, NFKD ) + " " +
normalize( text, NFKC )
);
}
}

Converting Java String to ascii

I think your question is the same as this one:

Java - getting rid of accents and converting them to regular letters

and hence the answer is also the same:

Solution

String convertedString = 
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "");

References

See

  • JavaDoc: Normalizer.normalize(String, Normalizer.Form)
  • JavaDoc: Normalizer.Form.NFD
  • Sun Java Tutorial: Normalizer's API

Example Code:

final String input = "Tĥïŝ ĩš â fůňķŷ Šťŕĭńġ";
System.out.println(
Normalizer
.normalize(input, Normalizer.Form.NFD)
.replaceAll("[^\\p{ASCII}]", "")
);

Output:

This is a funky String

How to remove special char from spanish word

Use StringUtils.stripAccents of Apache Commons:

String output = StringUtils.stripAccents("VIRÚ"); 
System.out.println(output); // VIRU

Maven dependency:

<dependency>
<groupId>org.apache.commons</groupId>
<artifactId>commons-lang3</artifactId>
<version>3.6</version>
</dependency>

How to get an alphanumeric String from any string in Java?

You can use the java.text.Normalizer class to convert your text into normal Latin characters followed by diacritic marks (accents), where possible. So for example, the single-character string "é" would become the two character string ['e', {COMBINING ACUTE ACCENT}].

After you've done this, your String would be a combination of unaccented characters, accent modifiers, and the other special characters you've mentioned. At this point you could filter the characters in your string using only a whitelist to keep what you want (which could be as simple as [A-Za-z0-9] for a regex, depending on what you're after).

An approach might look like:

String name ="I>télé"; //example
String normalized = Normalizer.normalize(name, Form.NFD);
String result = normalized.replaceAll("[^A-Za-z0-9]", "");

What is the better approach to trim unprintable characters from a string

Edited

use Normalizer (since java 6)

public static final Pattern DIACRITICS_AND_FRIENDS 
= Pattern.compile("[\\p{InCombiningDiacriticalMarks}\\p{IsLm}\\p{IsSk}]+");

private static String stripDiacritics(String str) {
str = Normalizer.normalize(str, Normalizer.Form.NFD);
str = DIACRITICS_AND_FRIENDS.matcher(str).replaceAll("");
return str;
}

And here and here are complete solution.

And if you only want to remove all non printable characters from a string, use

rawString.replaceAll("[^\\x20-\\x7e]", "")

Ref : replace special characters in string in java and How to remove high-ASCII characters from string like ®, ©, ™ in Java



Related Topics



Leave a reply



Submit