Cancelling a Long Running Regex Match

Cancel Regex match if timeout

You could spawn a child process that does the regex matching and kill it off if it hasn't completed in 10 seconds. Might be a bit overkill, but it should work.

fork is probably what you should use, if you go down this road.

If you'll forgive my non-pure functions, this code would demonstrate the gist of how you could communicate back and forth between the forked child process and your main process:

index.js

const { fork } = require('child_process');
const processPath = __dirname + '/regex-process.js';
const regexProcess = fork(processPath);
let received = null;

regexProcess.on('message', function(data) {
console.log('received message from child:', data);
clearTimeout(timeout);
received = data;
regexProcess.kill(); // or however you want to end it. just as an example.
// you have access to the regex data here.
// send to a callback, or resolve a promise with the value,
// so the original calling code can access it as well.
});

const timeoutInMs = 10000;
let timeout = setTimeout(() => {
if (!received) {
console.error('regexProcess is still running!');
regexProcess.kill(); // or however you want to shut it down.
}
}, timeoutInMs);

regexProcess.send('message to match against');

regex-process.js

function respond(data) {
process.send(data);
}

function handleMessage(data) {
console.log('handing message:', data);
// run your regex calculations in here
// then respond with the data when it's done.

// the following is just to emulate
// a synchronous computational delay
for (let i = 0; i < 500000000; i++) {
// spin!
}
respond('return regex process data in here');
}

process.on('message', handleMessage);

This might just end up masking the real problem, though. You may want to consider reworking your regex like other posters have suggested.

How to stop regex matching after 1 match without using non-greedy character

The really bad degenerate pattern never match. And if you find a good way of finding the degenerate cases, well you will probably be due a lot of money. You are probably better off with a timeout. In Perl I would use alarm combined with a block eval.

You may also be looking for (*COMMIT) in Perl which prevents backtracking.

Lightweight long-running method cancel pattern for Java

I am not aware of such a mechanism. Since you have to track your work in order to be able to perform rollbackWork(), a well-designed object-oriented solution is your best choice anyway, if you want to further evolve this logic! Typically, such a scenario could be implemented using the command pattern, which I still find pretty lightweight:

// Task or Command
public interface Command {
void redo();
void undo();
}

A scheduler or queue could then take care of executing such task / command implementations, and of rolling them back in order.

Limit a variety of regex patterns to execute by user on the server

Try/catch StackOverflowError, plus wrapping the whole thing in an aggressive timeout (say, 1 second), is almost certainly your best bet. It's also by far the simplest option. As you develop your implementation you'll probably need to catch other exception types as well.

Note that, for the timeout to work, you will need to use an interruptible CharSequence implementation rather than a plain String. I've used this stategy successfully with poorly-written, regex-heavy third party libraries before.

The Q&A linked above should help get you started: Cancelling a long running regex match?


The original approach you suggested – to try to detect "bad" patterns up-front – is a very hard problem to solve indeed. Are you familiar with the halting problem? It's only a little less hard than solving that (which is impossible to solve in the general case).

Regex Negation: Handling conditional if statements that cancel the match if fulfilled

Use

/(?<=(?<!\*)\*\*)\w+(?=\*\*(?!\*))|(?<=(?<!_)__)\w+(?=__(?!_))/gi

See proof.

Explanation

--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\* '*'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
(?<= look behind to see if there is:
--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
__ '__'
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
(?= look ahead to see if there is:
--------------------------------------------------------------------------------
__ '__'
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
_ '_'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
) end of look-ahead

JavaScript code:

const string = 'hello world **ant*** lorem **cat** opposum** *** ***antelope*** *rabbit __dog__';
console.log(string.match(/(?<=(?<!\*)\*\*)\w+(?=\*\*(?!\*))|(?<=(?<!_)__)\w+(?=__(?!_))/gi))


Related Topics



Leave a reply



Submit