How to use regex in String.contains() method in Java
String.contains
String.contains
works with String, period. It doesn't work with regex. It will check whether the exact String specified appear in the current String or not.
Note that String.contains
does not check for word boundary; it simply checks for substring.
Regex solution
Regex is more powerful than String.contains
, since you can enforce word boundary on the keywords (among other things). This means you can search for the keywords as words, rather than just substrings.
Use String.matches
with the following regex:
"(?s).*\\bstores\\b.*\\bstore\\b.*\\bproduct\\b.*"
The RAW regex (remove the escaping done in string literal - this is what you get when you print out the string above):
(?s).*\bstores\b.*\bstore\b.*\bproduct\b.*
The \b
checks for word boundary, so that you don't get a match for restores store products
. Note that stores 3store_product
is also rejected, since digit and _
are considered part of a word, but I doubt this case appear in natural text.
Since word boundary is checked for both sides, the regex above will search for exact words. In other words, stores stores product
will not match the regex above, since you are searching for the word store
without s
.
.
normally match any character except a number of new line characters. (?s)
at the beginning makes .
matches any character without exception (thanks to Tim Pietzcker for pointing this out).
Can I use regular expression in contains method
I want to use regular expression in contains method.
How can I use it
you can not use regex in contains
method
Using Java Regex, how to check if a string contains any of the words in a set ?
TL;DR For simple substrings
contains()
is best but for only matching whole words Regular Expression are probably better.
The best way to see which method is more efficient is to test it.
You can use String.contains()
instead of String.indexOf()
to simplify your non-regexp code.
To search for different words the Regular Expression looks like this:
apple|orange|pear|banana|kiwi
The |
works as an OR
in Regular Expressions.
My very simple test code looks like this:
public class TestContains {
private static String containsWord(Set<String> words,String sentence) {
for (String word : words) {
if (sentence.contains(word)) {
return word;
}
}
return null;
}
private static String matchesPattern(Pattern p,String sentence) {
Matcher m = p.matcher(sentence);
if (m.find()) {
return m.group();
}
return null;
}
public static void main(String[] args) {
Set<String> words = new HashSet<String>();
words.add("apple");
words.add("orange");
words.add("pear");
words.add("banana");
words.add("kiwi");
Pattern p = Pattern.compile("apple|orange|pear|banana|kiwi");
String noMatch = "The quick brown fox jumps over the lazy dog.";
String startMatch = "An apple is nice";
String endMatch = "This is a longer sentence with the match for our fruit at the end: kiwi";
long start = System.currentTimeMillis();
int iterations = 10000000;
for (int i = 0; i < iterations; i++) {
containsWord(words, noMatch);
containsWord(words, startMatch);
containsWord(words, endMatch);
}
System.out.println("Contains took " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();
for (int i = 0; i < iterations; i++) {
matchesPattern(p,noMatch);
matchesPattern(p,startMatch);
matchesPattern(p,endMatch);
}
System.out.println("Regular Expression took " + (System.currentTimeMillis() - start) + "ms");
}
}
The results I got were as follows:
Contains took 5962ms
Regular Expression took 63475ms
Obviously timings will vary depending on the number of words being searched for and the Strings being searched, but contains()
does seem to be ~10 times faster than regular expressions for a simple search like this.
By using Regular Expressions to search for Strings inside another String you're using a sledgehammer to crack a nut so I guess we shouldn't be surprised that it's slower. Save Regular Expressions for when the patterns you want to find are more complex.
One case where you may want to use Regular Expressions is if indexOf()
and contains()
won't do the job because you only want to match whole words and not just substrings, e.g. you want to match pear
but not spears
. Regular Expressions handle this case well as they have the concept of word boundaries.
In this case we'd change our pattern to:
\b(apple|orange|pear|banana|kiwi)\b
The \b
says to only match the beginning or end of a word and the brackets group the OR expressions together.
Note, when defining this pattern in your code you need to escape the backslashes with another backslash:
Pattern p = Pattern.compile("\\b(apple|orange|pear|banana|kiwi)\\b");
How to check if a string contains only digits in Java
Try
String regex = "[0-9]+";
or
String regex = "\\d+";
As per Java regular expressions, the +
means "one or more times" and \d
means "a digit".
Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d
in a java String gives you the actual result: \d
References:
Java Regular Expressions
Java Character Escape Sequences
Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.
Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:
String regex = "\\d+";
// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));
// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));
Question 1:
Isn't it necessary to add
^
and$
to the regex, so it won't match "aa123bb" ?
No. In java, the matches
method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$
(even though it is also correct). Please see the last negative test case.
Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find
method instead, described in detail here:
Difference between matches() and find() in Java Regex
Question 2:
Won't this regex also match the empty string,
""
?*
No. A regex \\d*
would match the empty string, but \\d+
does not. The star *
means zero or more, whereas the plus +
means one or more. Please see the first negative test case.
Question 3
Isn't it faster to compile a regex Pattern?
Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches
, and so if performance implications are important then a Pattern
can be compiled and used like this:
Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());
java regular expression for String.contains
You can use the LITERAL
flag when compiling your pattern to tell the engine you're using a literal string, e.g.:
Pattern p = Pattern.compile(yourString, Pattern.LITERAL);
But are you really sure that doing that and then reusing the result is faster than just String#contains
? Enough to make the complexity worth it?
How to check that a string contains characters other than those specified. (in Java)
To look for characters that are NOT a, b, or c, use something like the following:
if(!s.matches("[abc]+"))
{
System.out.println("The string you entered has some incorrect characters");
}
Using Java Regex, how to check if a String contains any | but not ||
To match 'a' or 'aaa' but not 'aa' you need a regex with a negative look-ahead; e.g.
((?<!a)a(?!a))|(a{3,})
That says "find either an 'a' that is not preceded by an 'a' and not followed by an 'a', or a sequence of 3 or more 'a'`".
However, find
with the above regex and this string "a bb aa" will give a hit. If you want to check that the string contains some 'a's and no 'aa', you will need to test the two conditions separately.
To match '|' characters instead of 'a' characters, replace 'a' with '\|' in the above.
String#contains using Pattern
If you need to write a .contains
like method based on Pattern
, you should choose the Matcher#find()
version:
Pattern.compile(Pattern.quote(s)).matcher(input).find()
If you want to use .matches()
, you should bear in mind that:
.*
will not match line breaks by default and you need(?s)
inline modifier at the start of the pattern or usePattern.DOTALL
option- The
.*
at the pattern start will cause too much backtracking and you may get a stack overflow exception, or the code execution might just freeze.
Related Topics
How to Use Class.Newinstance() with Constructor Arguments
[L Array Notation - Where Does It Come From
Import Package.* VS Import Package.Specifictype
How to Check If Multiplying Two Numbers in Java Will Cause an Overflow
How to Merge Two PDF Files into One in Java
Convert a String (Like Testing123) to Binary in Java
Java:Why Should We Use Bigdecimal Instead of Double in the Real World
What's Up with Java's "%N" in Printf
Java Error Opening Registry Key
Converting Integer to String with Comma for Thousands
Converting Any Object to a Byte Array in Java
Can You Split a Stream into Two Streams
Retrieving the Inherited Attribute Names/Values Using Java Reflection
Java8: Why Is It Forbidden to Define a Default Method for a Method from Java.Lang.Object