Remove Non English Characters PHP

Remove Non English Characters PHP

$str = preg_replace('/[^\00-\255]+/u', '', $str);

PHP how to Remove non-language Characters from a String?

No regex will be perfect for what you want - language and writing are just too complex for this. But an approximation could be

preg_replace('/[^\p{L}\p{M}\p{Z}\p{N}\p{P}]/u', ' ', $text);

This will replace anything by a space that's not a Unicode character with one of the properties “letter”, “mark”, “separator”, “number” or “punctuation”.

How to use substring on non english characters?

You should use multi-byte safe substr() operation based on number of characters for UTF-8:

mb_substr();

http://php.net/manual/en/function.mb-substr.php

Remove special characters from a url, but not other language characters

[^\p{L} 0-9]

\p{L} matches any kind of letter from any language
You can try this.This will preserve words from other languages and remove special symbols.See demo.

https://regex101.com/r/qH1uG3/8

$re = "/[^\\p{L} 0-9]/m";
$str = "@#\$#\$sadsadस्टैक ओवरफ्लो";
$subst = "";

$result = preg_replace($re, $subst, $str);

or

[^\p{L}\p{Z}\p{N}\p{M}]
  • \p{L} matches any kind of letter from any language
  • \p{Z} matches any kind of whitespace or invisible separator
  • \p{N} matches any kind of numeric character in any script
  • \p{M} matches a character intended to be combined with

To be more precise.See demo.

https://regex101.com/r/qH1uG3/11

PHP - remove all non-numeric characters from a string

You can use preg_replace in this case;

$res = preg_replace("/[^0-9]/", "", "Every 6 Months" );

$res return 6 in this case.

If want also to include decimal separator or thousand separator check this example:

$res = preg_replace("/[^0-9.]/", "", "$ 123.099");

$res returns "123.099" in this case

Include period as decimal separator or thousand separator: "/[^0-9.]/"

Include coma as decimal separator or thousand separator: "/[^0-9,]/"

Include period and coma as decimal separator and thousand separator: "/[^0-9,.]/"

Remove all non-numeric characters from a string; [^0-9] doesn't match as expected

Try this:

preg_replace('/[^0-9]/', '', '604-619-5135');

preg_replace uses PCREs which generally start and end with a /.

Getting first letter of word for non-English characters

The string is a multibyte string, meaning that one character can occupy more than one byte. You have to use multibyte functions to get at the charachter:

$name = "Афанасий Никитин";
echo mb_substr($name,0,1);

http://php.net/manual/en/function.mb-substr.php

or in the newest PHP versions you could try:

http://php.net/manual/en/function.mb-chr.php

preg_replace error with regex with non-english characters, character is not recognized

By default the PHP regex engine considers your string as a suit of bytes (i.e. as a suit of one byte characters).

When you use the u modifier, the regex engine changes two things:

  • Strings are seen as utf8 strings (so characters are encoded with eventually multiple bytes)
  • the meaning of shorthand character classes (like \s, \w, \d...) changes to include unicode characters instead of only ascii characters.

Note that these two changes can be written explicitly like this too, at the start of the pattern instead of using the u modifier:

(*UTF8)(*UCP)yourpattern

You can find the complete documentation of the pcre regex engine used by PHP here.



Related Topics



Leave a reply



Submit