Translate Perl regular expressions to .NET
There is a big comparison table in http://www.regular-expressions.info/refflavors.html.
Most of the basic elements are the same, the differences are:
Minor differences:
- Unicode escape sequences. In .NET it is
\u200A
, in Perl it is\x{200A}
. \v
in .NET is just the vertical tab (U+000B), in Perl it stands for the "vertical whitespace" class. Of course there is\V
in Perl because of this.- The conditional expression for named reference in .NET is
(?(name)yes|no)
, but(?(<name>)yes|no)
in Perl.
Some elements are Perl-only:
- Possessive quantifiers (
x?+
,x*+
,x++
etc). Use non-backtracking subexpression ((?>…)
) instead. - Named unicode escape sequence
\N{LATIN SMALL LETTER X}
,\N{U+200A}
. - Case folding and escaping
\l
(lower case next char),\u
(upper case next char).\L
(lower case),\U
(upper case),\Q
(quote meta characters) until\E
.
- Shorthand notation for Unicode property
\pL
and\PL
. You have to include the braces in .NET e.g.\p{L}
. - Odd things like
\X
,\C
. - Special character classes like
\v
,\V
,\h
,\H
,\N
,\R
- Backreference to a specific or previous group
\g1
,\g{-1}
. You can only use absolute group index in .NET. - Named backreference
\g{name}
. Use\k<name>
instead. - POSIX character class
[[:alpha:]]
. - Branch-reset pattern
(?|…)
\K
. Use look-behind ((?<=…)
) instead.- Code evaluation assertion
(?{…})
, post-poned subexpression(??{…})
. - Subexpression reference (recursive pattern)
(?0)
,(?R)
,(?1)
,(?-1)
,(?+1)
,(?&name)
. - Some conditional expression's predicate are Perl-specific:
- code
(?{…})
- recursive
(R)
,(R1)
,(R&name)
- define
(DEFINE)
.
- code
- Special Backtracking Control Verbs
(*VERB:ARG)
- Python syntax
(?P<name>…)
. Use(?<name>…)
instead.(?P=name)
. Use\k<name>
instead.(?P>name)
. No equivalent in .NET.
Some elements are .NET only:
- Variable length look-behind. In Perl, for positive look-behind, use
\K
instead. - Arbitrary regular expression in conditional expression
(?(pattern)yes|no)
. - Character class subtraction (undocumented?)
[a-z-[d-w]]
- Balancing Group
(?<-name>…)
. This could be simulated with code evaluation assertion(?{…})
followed by a(?&name)
.
References:
- .NET Framework 4: Regular Expression Language Elements
- perlre
.NET equivalent to Perl regular expressions
In Perl, you can think of the slashes as something like double-quotes with the added meaning of "between these slashes is a regex-string". The first block of code is a Perl find/replace regular expression:
$stringvar =~ s/findregex/replaceregex/;
It takes findregex
and replaces it with replaceregex
, in-place. The given example is a very simple search, and the .NET Regex class would be overkill. String.Replace()
method will do the job:
letter = letter.Replace("Users ", "")
letter = letter.Replace("Mailboxes ", "")
The second part is Perl for find only. It returns true
if the findregex string is found and leaves the actual string itself untouched.
$stringvar =~ /findregex/;
String.Contains()
can handle this in .NET:
if (!(storegroup.Contains("Recovery") _
or storegroup.Contains("Users U V W X Y Z") _
or storegroup.Contains("you get the idea"))) Then
...
How could I translate regular expressions in Javascript syntax to .NET syntax
Although it is commercial (i.e. non-free, but cheap) I could not recommend "RegexBuddy" http://www.regexbuddy.com/ highly enough.
Using a standard "standard" RegEx syntax (which you can interactively build and test) it will then generate the source code in correct syntax for use in several environments and many "scenarios" including .net, javascript, Perl, PHP, Python etc.
With my lacklustre knowledge of Regex, this program is a lifesaver.
* disclaimer: No affiliation whatsoever - just a very happy multi-year customer
** Extra note -- I just notice that Jeff Attwood has a testimonial on their homepage!
- Just for fun: Here is the RFC2822 email verification source generated by RegExBuddy for both .net (C#) and JavaScript
JavaScript:
if (/(?:[a-z0-9!#$%&'*+\/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+\/=?^_`{|}~-]+)*|"(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])/im.test(subject)) {
// Successful match
} else {
// Match attempt failed
}
.net C#
try {
if (Regex.IsMatch(subjectString, @"(?:[a-z0-9!#$%&'*+/=?^_`{|}~-]+(?:\.[a-z0-9!#$%&'*+/=?^_`{|}~-]+)*|""(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21\x23-\x5b\x5d-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])*"")@(?:(?:[a-z0-9](?:[a-z0-9-]*[a-z0-9])?\.)+[a-z0-9](?:[a-z0-9-]*[a-z0-9])?|\[(?:(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?)\.){3}(?:25[0-5]|2[0-4][0-9]|[01]?[0-9][0-9]?|[a-z0-9-]*[a-z0-9]:(?:[\x01-\x08\x0b\x0c\x0e-\x1f\x21-\x5a\x53-\x7f]|\\[\x01-\x09\x0b\x0c\x0e-\x7f])+)\])", RegexOptions.IgnoreCase | RegexOptions.Multiline)) {
// Successful match
} else {
// Match attempt failed
}
} catch (ArgumentException ex) {
// Syntax error in the regular expression
Regular expression to match any vertical whitespace
As you say, the Perl character class \v
matches [\x0A-\x0D]
(linefeed, vertical tab, form feed and carriage-return (although I would dispute that CR is vertical white space)) in addition to the non-ASCII code points [\x{2028}\x{2029}]
(line separator and paragraph separator).
You can hand-build this character class in .NET like this
[\u0A-\u0D\u2028\u2029]
Regular expressions - C# behaves differently than Perl / Python
In your example the difference seems to be in the semantics of the 'replace' function rather than in the regular expression processing itself.
.net is doing a "global" replace, i.e. it is replacing all matches rather than just the first match.
Global Replace in Perl
(notice the small 'g' at the end of the =~s line)
$a="This is a test";
$a=~s/(.*)/George/g;
print $a;
which produces
GeorgeGeorge
Single Replace in .NET
var re = new Regex("(.*)");
var replacePattern = "George";
var newValue = re.Replace("This is nice", replacePattern, 1) ;
Console.WriteLine(newValue);
which produces
George
since it stops after the first replacement.
Related Topics
An Expression Tree May Not Contain a Call or Invocation That Uses Optional Arguments
How to Embed My Own Fonts in a Winforms App
How to Modify Existing Xml File with Xmldocument and Xmlnode in C#
What Happens If I Return Before the End of Using Statement? Will the Dispose Be Called
How to Split an Ienumerable into Two by a Boolean Criteria Without Two Queries
Retrieving Files from Directory That Contains Large Amount of Files
How to Find the Assembly System.Web.Extensions Dll
How to Generate Truly (Not Pseudo) Random Numbers with C#
Visual Studio 2010 Conditional References
Enumerating Collections That Are Not Inherently Ienumerable
How to Ignore a Certificate Error with C# 2.0 Webclient - Without the Certificate
When to Use Properties Instead of Functions
Memorycache Does Not Obey Memory Limits in Configuration
String Format Numbers Thousands 123K, Millions 123M, Billions 123B
Differencebetween Casting and Coercing
System.Net.Webclient Unreasonably Slow
Best Practices for Serializing Objects to a Custom String Format for Use in an Output File