Is there an equivalent of java.util.regex for glob type patterns?
There's nothing built-in, but it's pretty simple to convert something glob-like to a regex:
public static String createRegexFromGlob(String glob)
{
String out = "^";
for(int i = 0; i < glob.length(); ++i)
{
final char c = glob.charAt(i);
switch(c)
{
case '*': out += ".*"; break;
case '?': out += '.'; break;
case '.': out += "\\."; break;
case '\\': out += "\\\\"; break;
default: out += c;
}
}
out += '$';
return out;
}
this works for me, but I'm not sure if it covers the glob "standard", if there is one :)
Update by Paul Tomblin: I found a perl program that does glob conversion, and adapting it to Java I end up with:
private String convertGlobToRegEx(String line)
{
LOG.info("got line [" + line + "]");
line = line.trim();
int strLen = line.length();
StringBuilder sb = new StringBuilder(strLen);
// Remove beginning and ending * globs because they're useless
if (line.startsWith("*"))
{
line = line.substring(1);
strLen--;
}
if (line.endsWith("*"))
{
line = line.substring(0, strLen-1);
strLen--;
}
boolean escaping = false;
int inCurlies = 0;
for (char currentChar : line.toCharArray())
{
switch (currentChar)
{
case '*':
if (escaping)
sb.append("\\*");
else
sb.append(".*");
escaping = false;
break;
case '?':
if (escaping)
sb.append("\\?");
else
sb.append('.');
escaping = false;
break;
case '.':
case '(':
case ')':
case '+':
case '|':
case '^':
case '$':
case '@':
case '%':
sb.append('\\');
sb.append(currentChar);
escaping = false;
break;
case '\\':
if (escaping)
{
sb.append("\\\\");
escaping = false;
}
else
escaping = true;
break;
case '{':
if (escaping)
{
sb.append("\\{");
}
else
{
sb.append('(');
inCurlies++;
}
escaping = false;
break;
case '}':
if (inCurlies > 0 && !escaping)
{
sb.append(')');
inCurlies--;
}
else if (escaping)
sb.append("\\}");
else
sb.append("}");
escaping = false;
break;
case ',':
if (inCurlies > 0 && !escaping)
{
sb.append('|');
}
else if (escaping)
sb.append("\\,");
else
sb.append(",");
break;
default:
escaping = false;
sb.append(currentChar);
}
}
return sb.toString();
}
I'm editing into this answer rather than making my own because this answer put me on the right track.
Match path string using glob in Java
If you have Java 7 can use FileSystem.getPathMatcher
:
final PathMatcher matcher = FileSystem.getPathMatcher("glob:**/*.txt");
This will require converting your strings into instances of Path
:
final Path myPath = Paths.get("/foo/bar.txt");
For earlier versions of Java you might get some mileage out of Apache Commons' WildcardFileFilter
. You could also try and steal some code from Spring's AntPathMatcher
- that's very close to the glob-to-regex approach though.
Find file with a pattern
String.matches()
takes a regular expression, and not a glob pattern.
It so happens that ENV20120517*.*DAT
is a valid regex. It does, however, have a different meaning to what you're expecting: it matches any string that starts with ENV2012051
and ends in DAT
(the .*
matches anything, and the 7*
is effectively a no-op).
The following regex is equivalent to the pattern in your question ENV20120517.*[.].*DAT
For some ideas on how to do glob matching in Java, see Is there an equivalent of java.util.regex for "glob" type patterns?
C equivalent to java.util.regex
I've had some luck using PCRE for complicated regexes from C or C++.
It's pretty widely used and compliant. It used to have some issues with unicode data, but it looks like some of those have been resolved now.
PCRE supports named captures as used in your example using the pcre_copy_named_substring
function.
glob patterns difference between {} and +()
{}
implements something similar to Bash's brace expansion. Essentially src/**/*.{js,jsx,ts,tsx,json,css}
will become:
[
'src/**/*.js',
'src/**/*.jsx',
'src/**/*.ts',
'src/**/*.tsx',
'src/**/*.json',
'src/**/*.css'
]
There is a time and place for this, but you can see this might be less efficient as now you are processing multiple patterns.
You can think of +()
more like in regular expression where +(js|jsx|ts|tsx|json|css)
would be more equivalent to (js|jsx|ts|tsx|json|css)+
.
So it would match things like js
or jsjsxtxjson
Which is not really equivalent to {}
.
What you are probably interested in, if looking for a more efficient comparison to {}
, is probably @(js|jsx|ts|tsx|json|css)
which is equivalent to regular expression patterns like this (js|jsx|ts|tsx|json|css)
which would match just one occurrence and would match js
but not jsjsxtxjson
. The reason why this may be more efficient is simply that you get a single pattern as opposed to multiple patterns.
Pattern searching
Have a look into the regular expression package java.util.regex
. You find a good starting point here.
Related Topics
How to Enumerate All Classes in a Package and Add Them to a List
Returning Overlapping Regular Expressions
What Is the Class Object (Java.Lang.Class)
A Regex to Match a Substring That Isn't Followed by a Certain Other Substring
Declaring and Initializing Variables Within Java Switches
How to Connect to Postgres Db Due to the Authentication Type 10 Is Not Supported
How to Have a Jtabbedpane with a Jmenubar
System.Currenttimemillis() VS. New Date() VS. Calendar.Getinstance().Gettime()
Working with a List of Lists in Java
Calling a Mapreduce Job from a Simple Java Program
Drawing in Jlayeredpane Over Exising JPAnels
Immutable VS Unmodifiable Collection
Handling Soft-Deletes with Spring JPA
Load Resource from Anywhere in Classpath
How to Make Image Appear Randomly Every X Seconds in Java Using Timer