Recursive Matching with Regular Expressions in JavaScript

Recursive matching with regular expressions in Javascript

Annoyingly, Javascript does not provide the PCRE recursive parameter (?R), so it is far from easy to deal with the nested issue. It can be done however.

I won't reproduce code, but if you check out Steve Levithan's blog, he has some good articles on the subject. He should do, he is probably the leading authority on RegExp in JS. He wrote XRegExp, which replaces most of the PCRE bits that are missing, there is even a Match Recursive plugin!

Recursive regex pattern in JavaScript

If you're intending on using a regular expression for this task, you can consider the following:

var r = s.replace(/((`(?:``)?)[^`]*\2)|>/g, '$1<')
.replace(/((`(?:``)?)[^`]*\2)|</g, '$1>')
.replace(/`[<>]+/g, '`');

Working Demo

Regex to group recursive pattern in string and return capture group as array

Added a ^ to match beginning of string only. Using lookaheads.

Note that lookaheads doesn't have good browser support.

Also, I don't know/haven't tested the performance characteristics of this versus exec looping. And to be entirely honest, it's not much cleaner looking than split. If you want a one-liner, this'll suffice I guess.

I remember for very long strings on other platforms using RegExs like this didn't do well. But I ended up resorting to other methods to searching anyways.

console.log("Hello:A1,A3,A4,AA04".match(/(^[A-Za-z]+(?=:))|((?=,?)([A-Za-z0-9]+))/g))

Recursive Regex expression to split expression by character

Use String#split

var array = '23;hello;2452;ad34aa;bye'.split(';');document.write('<pre>' + JSON.stringify(array, 0, 4) + '</pre>');
// as mentioned from Avinash Rajvar array2 = '23;hello;2452;ad34aa;bye'.match(/[^;]+/g);document.write('<pre>' + JSON.stringify(array2, 0, 4) + '</pre>');

RegExp to be recursive

This is not so much about recursion as getting all matches. In Javascript you have to make your regular expression global

/([^>]+)/g

This will match all sub strings in your string:

string
string25
string89 (including space at the end)

Or you could easily just split your string with > delimiter and collect individuals:

yourString.split(">");

Edit

After you've written your desired result I'd suggest you go with @HamZa's solution that uses positive lookahead. And you'll get pairs back.

/(?=([^>]+>[^>]+))[^>]+>/g

Some explanation

Regular expressions parse strings from from left to right iterating over each character (to simplify the process). Positive lookaheads on the other hand don't progress the current parsing position but rather do what they say: they lookahead if their expression is found:

t(?=s) will match second t in streets because it sees that s is followed by t. But parsing after this match will continue from t on.

I hope this explains it a bit.

Actual solution expression

But to explain the actual regular expression it's a rather clever one how it progresses string parsing:

  1. It first has a positive lookahead (it doesn't increment parsing position) to check if at current parsing position there's a pair you're looking for:

    (?=([^>]+>[^>]+))
  2. If lookahead matches such a pair it stores it as a match (hence the inner parentheses)
  3. Then after the lookahead we have the single string expression [^>]+> that doesn't get stored as a match (not within parentheses) but rather takes care that parsing progresses for a single string up to and including the next > character.
  4. Because this regular expression is global it then starts to do a match all over again but this time from the next character position after > character as the previous parsing progressed/incremented/advanced to it.

Javascript parse multiple brackets recursively from a string

String is just a stream of character.

  • At first we need a lexical analyzer. It take stream of character and generate token. Example:
Input  = `"Hello world" (!)`
Output = Lex(Input) -> ["Hello world", "(" , "!", ")"]
  • We can use a recursive function. where we can create a stack on top and return new array.
Tokens = ["Hello world", "(" , "!", ")"]
Result = [ "Hello world", ["!"] ]

let input = `this OR (this AND (this OR (that AND this))) "this is a text" OR this`

function* Lex(characters) {
let token = "", char;
while (char = characters.shift()) switch (char) {
case '(': case ')':
if (token) { yield token; token = '' }
yield char
break
case '"':
let index = characters.indexOf(char);
token = characters.splice(0, index).join('');
case ' ':
if (token) yield token
token = ''
break
default: token += char
}
}
function fnName(tokens) {
let result = [], item;
while (item = tokens.shift()) {
if (item == ')') return result
result.push(item == '(' ? fnName(tokens) : item)
}
return result
}
let tokens = [...Lex(input.split(''))];
console.log(fnName(tokens))


Related Topics



Leave a reply



Submit