PHP and regexp to accept only Greek characters in form
I'm not too current on the Greek alphabet, but if you wanted to do this with the Roman alphabet, you would do this:
/^[a-zA-Z\s]*$/
So to do this with Greek, you replace a
and z
with the first and last letters of the Greek alphabet. If I remember right, those are α
and ω
. So the code would be:
/^[α-ωΑ-Ω\s]*$/
Regular expression to accept only greek characters
You may match chars with \p{Greek}
and you must use the /u
modifier:
'~^\p{Greek}{2,3}[0-9]{3,4}$~u'
See the regex demo.
Pattern details
^
- start of string\p{Greek}{2,3}
- 2 or 3 Greek chars[0-9]{3,4}
- 3 or 4 ASCII digits$
- end of string.
Regexp Greek chars by number
Are you using the UTF-8 pattern modifier?
/\p{Greek}{4,}/u
Regular expression testing with Greek characters php
The problem with your regex: /^[\p{Greek}\s\d a-zA-Z]+/u
is that it tells your engine what to start matching. That being said, it does not provide any instructions on what to do at the end of your string. Changing your regex to this: /^[\p{Greek}\s\d a-zA-Z]+$/u
(notice the $
at the end) should fix the problem.
The ^
and $
combo essentially instruct the regex engine to start matching at the beginning of the string (^
and at the end $
).
Python regex greek characters
Just like Latin alphabets, Greek alphabets occupy a continuous space in the utf-8 encoding, so you can use \([α-ωΑ-Ω]*\)
instead of \([A-Za-z]*\
to construct your regex.
I would personally prefer to use a regex like "[A-Za-z]* \([α-ωΑ-Ω]*\)"
to check if the pattern holds and use string functions to do split jobs. But I believe it depends on your personal preference.
Regular expression - preg_match Latin and Greek characters
Ok, can this replace your function?
$subject = 'OnCEΨΩ é-+@àupon</span> aαθ tIME !#%@$ in MEXIco in the year 1874 <or 1875';
function format($str, $excludeRE = '/[^a-z0-9]+/u', $separator = '-') {
$str = strip_tags($str);
$str = strtolower($str);
$str = preg_replace($excludeRE, $separator, $str);
$str = trim($str, $separator);
return $str;
}
echo format($subject);
Note that you will loose all characters after a <
(cause of strip_tags) until you meet a >
// Old answer when I tought you wanted to preserve greek characters
It's possible to build a character range such as α-ω or any strange characters you want! The reason your pattern doesn't work is that you don't inform the regex engine you are dealing with a unicode string. To do that, you must add the u
modifier at the end of the pattern. Like that:
/[^a-z0-9α-ω]+/u
You can use chars hexadecimal code too:
/[^a-z0-9\x{3B1}-\x{3C9}]+/u
Note that if you are sure not to have or want to preserve, uppercase Greek chars in your string, you can use the character class \p{Greek}
like this :
/[^a-z0-9\p{Greek}]+/u
(It's a little longer but more explicit)
Greek characters, Regular Expressions, and C#
In .NET languages, you can use \p{IsGreekandCoptic}
to match Greek characters. So the resulting regex is
[^a-zA-Z0-9-()/\s\p{IsGreekandCoptic}]
\p{IsGreekandCoptic}
matches:
These characters will be matched by \p{IsGreekandCoptic} http://img203.imageshack.us/img203/3760/greekcoptic.png
Javascript - regex to remove special characters but also keep greek characters
The way these ranges are defined is based on their character code. So, since A
has char code 65
, and z
has char code 122
, the following regex:
[A-z]
would match every letter, but also every character with char codes that fall between those char codes, namely those with codes 91 through 95, which would be the characters [\]^_
. (demo).
Now, for Greek letters, the character codes for the uppercase characters are 913-937 for alpha through omega, and the lowercase characters are 945-969 for alpha through omega (this includes both lowercase variants of sigma, namely ς
(962) and σ
(963)).
So, to match every character except for latin letters, greek letters, and arabic numerals, you need the following regex:
[a-zA-Z0-9α-ωΑ-Ω]
So, for greek characters, it works just like latin letters.
Edit: I've tested this via a Google Translate'd Lipsum, and it looks like this doesn't take accented letters into account. I've checked what the character codes for these accented letters were, and it turns out they are placed right before the lowercase letters, or right after the uppercase letters. So, the following regex works for all greek letters, including accented ones:
[a-zA-Z0-9ά-ωΑ-ώ]
Demo
This expanded range now also includes άέήίΰ
(char codes 940 through 944) and ϊϋόύώ
(codes 970 through 974).
To also include whitespace (spaces, tabs, newlines), simply include a \s
in the range:
[a-zA-Z0-9ά-ωΑ-ώ\s]
Demo.
Edit: Apparently there are more Greek letters that needed to be included in this range, namely those in the range [Ά-Ϋ]
, which is the range of letters right before the ά
, so the new regex would look like this:
[a-zA-Z0-9Ά-ωΑ-ώ\s]
Demo.
Related Topics
Ruby: Append Text to the 2Nd Line of a File
Vps Apache Config - Invalid Command 'Passengerdefaultruby' After Adding Latest Passenger Gem
Mongomapper Association Skips Duplicates
Ruby on Rails View Rendering Db Info on Page
Unexpected =>, Expecting '}' in Rspec Expect
Find a Unique Element in a Compound Array
Determining Type of an Object in Ruby
Fetch VS. [] When Working with Hashes
Download All Gems Dependencies
How to Render the Ajax Response in Rails
Save Google Cloud Speech API Operation(Job) Object to Retrieve Results Later
Rally Ruby Toolkit: How to Get Url of Portfolio Item's State
Regex to Remove the Webpage Part of a Url in Ruby
API Post with Array Using Http Gem (Or Restclient)