Regex Backreference to Match Different Values

RegEx BackReference to Match Different Values

Note that \g{N} is equivalent to \1, that is, a backreference that matches the same value, not the pattern, that the corresponding capturing group matched. This syntax is a bit more flexible though, since you can define the capture groups that are relative to the current group by using - before the number (i.e. \g{-2}, (\p{L})(\d)\g{-2} will match a1a).

The PCRE engine allows subroutine calls to recurse subpatterns. To repeat the pattern of Group 1, use (?1), and (?&Val) to recurse the pattern of the named group Val.

Also, you may use character classes to match single characters, and consider using ? quantifier to make parts of the regex optional:

(\(\s*(?P<Val>[a-zA-Z]+[0-9]*|[0-9]+|\'.*\'|\[.*\])\s*(ni|in|[*\/+-]|[=!><]=|[><])\s*((?&Val))\s*\))

See the regex demo

Note that \'.*\' and \[.*\] can match too much, consider replacing with \'[^\']*\' and \[[^][]*\].

Regular expression - back reference to match exact first match

Seems to me that what you really want to do is eliminate any closing and opening tags that are adjacent to each other.

In this:

This is a <strong>test</strong><strong>string</strong>.

You're not wanting to combine the contents of the first tag with the contents of the second tag. You just want to get rid of the </strong><strong> in the middle.

So do something like

s/<\/(\w+)><\1>//;

If you want to limit it to certain tags, do:

s/<\/(strong|emphasis)><\1>//;

(You didn't specify what language you're using so I used sed substitutions.)

Using a Regex Back-reference In a Repetition Construct ({N})

Regular expressions can't calculate, so you can't do this with a regex only.

You could match the string to /^\{(\d+)\}(.*)$/, then check whether len($2)==int($1).

In Python, for example:

>>> import re
>>> t1 = "{3}abc"
>>> t2 = "{3}abcd"
>>> r = re.compile(r"^\{(\d+)\}(.*)$")
>>> m1 = r.match(t1)
>>> m2 = r.match(t2)
>>> len(m1.group(2)) == int(m1.group(1))
True
>>> len(m2.group(2)) == int(m2.group(1))
False

Regex backreference to match opposite case

You want to use the following pattern with the Python regex module:

^(?=(\p{L})(\p{L})(\p{L})(\p{L})(\p{L}))(?=.*(?!\1)(?i:\1)(?!\2)(?i:\2)(?!\3)(?i:\3)(?!\4)(?i:\4)(?!\5)(?i:\5)$)

See the regex demo

Details

  • ^ - start of string
  • (?=(\p{L})(\p{L})(\p{L})(\p{L})(\p{L})) - a positive lookahead with a sequence of five capturing groups that capture the first five letters individually
  • (?=.*(?!\1)(?i:\1)(?!\2)(?i:\2)(?!\3)(?i:\3)(?!\4)(?i:\4)(?!\5)(?i:\5)$) - a ppositive lookahead that make sure that, at the end of the string, there are 5 letters that are the same as the ones captured at the start but are of different case.

In brief, the first (\p{L}) in the first lookahead captures the first a in abcdeABCDE and then, inside the second lookahead, (?!\1)(?i:\1) makes sure the fifth char from the end is the same (with the case insensitive mode on), and (?!\1) negative lookahead make sure this letter is not identical to the one captured.

The re module does not support inline modifier groups, so this expression won't work with that moduue.

Python regex based module demo:

import regex
strs = ['abcdeABCDE', 'abcdEABCDe', 'zYxWvZyXwV', 'abcdeABCDZ', 'abcdeABCDe']
rx = r'^(?=(\p{L})(\p{L})(\p{L})(\p{L})(\p{L}))(?=.*(?!\1)(?i:\1)(?!\2)(?i:\2)(?!\3)(?i:\3)(?!\4)(?i:\4)(?!\5)(?i:\5)$)'
for s in strs:
print("Testing {}...".format(s))
if regex.search(rx, s):
print("Matched")

Output:

Testing abcdeABCDE...
Matched
Testing abcdEABCDe...
Matched
Testing zYxWvZyXwV...
Matched
Testing abcdeABCDZ...
Testing abcdeABCDe...

Regex to match repeated attribute values in a string

You may use this regex replacement in Javascript:

str = str.replace/(={\d+})(?:\s+\w+\1)+/g, '$1')

RegEx Demo

RegEx Details

  • (={\d+}): Match = followed by 1+ digits in capture group #1
  • (?:\s+\w+\1)+: Match 1 or more instances of key=value pairs separated by 1+ whitespaces where value is back-reference of \1 to ensure we match same number in `value.
  • Replacement is $1 to put captured value back in input

Regex expression to back reference more than 9 values in a replace

Most of the simple Regex engines used by editors aren't equipped to handle more than 10 matching groups; it doesn't seem like UltraEdit can. I just tried Notepad++ and it won't even match a regex with 10 groups.

Your best bet, I think, is to write something fast in a quick language with a decent regex parser. but that wouldn't answer the question as asked

Here's something in Python:

import re

pattern = re.compile('(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)')
with open('input.txt', 'r') as f:
for line in f:
m = pattern.match(line)
print m.groups()

Note that Python allows backreferences such as \20: in order to have a backreference to group 2 followed by a literal 0, you need to use \g<2>0, which is unambiguous.

Edit:
Most flavors of regex, and editors which include a regex engine, should follow the replace syntax as follows:

abcdefghijklmnop
search: (.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(.)(?<name>.)(.)
note: 1 2 3 4 5 6 7 8 9 10 11 12 13
value: a b c d e f g h i j k l m
replace result:
\11 k1 i.e.: match 1, then the character "1"
${12} l most should support this
${name} l few support named references, but use them where you can.

Named references are usually only possible in very specific flavor of regex libraries, test your tool to know for sure.

Alphabetic order regex using backreferences

I'm posting this answer more as a comment than an answer since it has better formatting than comments.

Related to your questions:

  1. Can I use back references in my character classes to check for ascending order strings?

No, you can't. If you take a look a backref regular-expressions section, you will find below documentation:

Parentheses and Backreferences Cannot Be Used Inside Character Classes

Parentheses cannot be used inside character classes, at least not as metacharacters. When you put a parenthesis in a character class, it is treated as a literal character. So the regex [(a)b] matches a, b, (, and ).

Backreferences, too, cannot be used inside a character class. The \1 in a regex like (a)[\1b] is either an error or a needlessly escaped literal 1. In JavaScript it's an octal escape.

Regarding your 2nd question:


  1. Is there any less-hacky solution to this puzzle?

Imho, your regex is perfectly well, you could shorten it very little at the beginning like this:

(?=^.{5}$)^a*b*c*d*e*f*g*h*i*j*k*l*m*n*o*p*q*r*s*t*u*v*w*x*y*z*$
^--- Here

Regex demo

How to Perform Operations on Regex Backreference Matches in Javascript?

Use string.replace with a function as the second argument:

var s1 = "This is a string with numbers 1 2 3 4 5 6 7 8 9 10";
var s2 = s1.replace(/\d+/g, function(x) { return Number(x)+1; });
s2; // => "This is a string with numbers 2 3 4 5 6 7 8 9 10 11"

Note that if you use matching groups then the first argument to the function will be the entire match and each following argument will be the numbered matching group.

var x = "This is x1, x2, x3.";
var y = x.replace(/x(\d+)/g, function(m, g1) {
return "y" + (Number(g1)+1);
});
y; // => "This is y2, y3, y4."

RegEx modify Backreference's value

Many implementations (javascript, python etc) let you specify a function as the replace parameter. The function normally takes the whole matched string, its position in the input string, and the captured groups as arguments. The string returned by this function is used as the replacement text.

Here is how to do it using JavaScript: the replace function takes the whole matched substring as its first argument, value of captured groups as the next n arguments, followed by the index of the matched string in the original input string and the whole input string.

var s = "this is a test. and this is another one.";
console.log("replacing");
r = s.replace(/(this is) ([^.]+)/g, function(match, first, second, pos, input) {
console.log("matched :" + match);
console.log("1st group :" + first);
console.log("2nd group :" + second);
console.log("position :" + pos);
console.log("input :" + input);
return "That is " + second.toUpperCase();
});
console.log("replaced string is");
console.log(r);

ouput:

replacing
matched :this is a test
1st group :this is
2nd group :a test
pos :0
input :this is a test. and this is another one.
matched :this is another one
1st group :this is
2nd group :another one
pos :20
input :this is a test. and this is another one.
replaced string is
That is A TEST. and That is ANOTHER ONE.

And here is the python version - it even gives you start/end values for each group:

#!/usr/bin/python
import re
s = "this is a test. and this is another one.";
print("replacing");

def repl(match):
print "matched :%s" %(match.string[match.start():match.end()])
print "1st group :%s" %(match.group(1))
print "2nd group :%s" %(match.group(2))
print "position :%d %d %d" %(match.start(), match.start(1), match.start(2))
print "input :%s" %(match.string)
return "That is %s" %(match.group(2).upper())

print "replaced string is \n%s"%(re.sub(r"(this is) ([^.]+)", repl, s))

Output:

replacing
matched :this is a test
1st group :this is
2nd group :a test
position :0 0 8
input :this is a test. and this is another one.
matched :this is another one
1st group :this is
2nd group :another one
position :20 20 28
input :this is a test. and this is another one.
replaced string is
That is A TEST. and That is ANOTHER ONE.

Match every character but backreference in JS

You may use a tempered greedy token to emulate a negated character class with multicharacter alternatives:

__\s*(['"])((?:(?!\1).)*)\1
^^^^^^^^^^^^

See the regex demo

If there can be a newline in between the quotes, replace . with a [\s\S]:

__\s*(['"])((?:(?!\1)[\s\S])*)\1

Here is a working snippet:

var re = /__\s*(['"])((?:(?!\1).)*)\1/g; 

var str = '__ \'Anything1\' and in __ "Anything2"';

while ((m = re.exec(str)) !== null) {

document.body.innerHTML += m[2] + "<br/>"; // demo

}


Related Topics



Leave a reply



Submit