Using Regex to Generate Strings Rather Than Match Them

Using Regex to generate Strings rather than match them

Edit:

Complete list of suggested libraries on this question:

  1. Xeger* - Java
  2. Generex* - Java
  3. Rgxgen - Java
  4. rxrdg - C#

* - Depends on dk.brics.automaton

Edit:
As mentioned in the comments, there is a library available at Google Code to achieve this:
https://code.google.com/archive/p/xeger/

See also https://github.com/mifmif/Generex as suggested by Mifmif

Original message:

Firstly, with a complex enough regexp, I believe this can be impossible. But you should be able to put something together for simple regexps.

If you take a look at the source code of the class java.util.regex.Pattern, you'll see that it uses an internal representation of Node instances. Each of the different pattern components have their own implementation of a Node subclass. These Nodes are organised into a tree.

By producing a visitor that traverses this tree, you should be able to call an overloaded generator method or some kind of Builder that cobbles something together.

java use Regular Expressions to generate a string

I'd stick to a Java-solution in this case, something along the lines of:

private String allowedChars = "abcdefghijklmnopqrstuvwxyzABCDEFGRHIJKLMNOPQRSTUVWXYZ0123456789";

public String getRandomValue(int min, int max) {
Random random = new Random();
int length = random.nextInt(max - min + 1) + min;
StringBuilder sb = new StringBuilder();
for(int i = 0; i < length; i++) {
sb.append(allowedChars.charAt(random.nextInt(allowedChars.length())));
}
return sb.toString();
}

You can call this with getRandomValue(5, 10);

I have not tried this code, since I have no IDE available

Note, if you're not apposed to using third party libraries, there are numerous available.

Random Text generator based on regex

Xeger is capable of doing it:

String regex = "[ab]{4,6}c";
Xeger generator = new Xeger(regex);
String result = generator.generate();
assert result.matches(regex);

Generating strings from regular expression in JavaScript

If you're using JavaScript, there's Randexp which generates random strings that match a given regex.

Releases for browser

How to generate random strings that match a given regexp?

Parse your regular expression into a DFA, then traverse your DFA randomly until you end up in an accepting state, outputting a character for each transition. Each walk will yield a new string that matches the expression.

This doesn't work for "regular" expressions that aren't really regular, though, such as expressions with backreferences. It depends on what kind of expression you're after.

Regular Expression to generate a string

No, regex does not generate text, it matches text.

However, if you're using Java, take a look at Xeger which can do what you want.


Also, see these similar questions:

Using Regex to generate Strings rather than match them

How do I generate text matching a regular expression from a regular expression?

Reverse regular expressions to generate data

Generate random string from regex character set

Paul McGuire, author of Pyparsing, has written an inverse regex parser, with which you could do this:

import invRegex
print(''.join(invRegex.invert('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

If you do not want to install Pyparsing, there is also a regex inverter that uses only modules from the standard library with which you could write:

import inverse_regex
print(''.join(inverse_regex.ipermute('[a-z]')))
# abcdefghijklmnopqrstuvwxyz

Note: neither module can invert all regex patterns.


And there are differences between the two modules:

import invRegex
import inverse_regex
print(repr(''.join(invRegex.invert('.'))))
print(repr(''.join(inverse_regex.ipermute('.'))))

yields

'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~'
'0123456789abcdefghijklmnopqrstuvwxyzABCDEFGHIJKLMNOPQRSTUVWXYZ!"#$%&\'()*+,-./:;<=>?@[\\]^_`{|}~ \t\n\r\x0b\x0c'

Here is another difference, this time pyparsing enumerates a larger set of matches:

x = list(invRegex.invert('[a-z][0-9]?.'))
y = list(inverse_regex.ipermute('[a-z][0-9]?.'))
print(len(x))
# 26884
print(len(y))
# 1100



Related Topics



Leave a reply



Submit