How to Use Regex in String.Contains() Method in Java

How to use regex in String.contains() method in Java

String.contains

String.contains works with String, period. It doesn't work with regex. It will check whether the exact String specified appear in the current String or not.

Note that String.contains does not check for word boundary; it simply checks for substring.

Regex solution

Regex is more powerful than String.contains, since you can enforce word boundary on the keywords (among other things). This means you can search for the keywords as words, rather than just substrings.

Use String.matches with the following regex:

"(?s).*\\bstores\\b.*\\bstore\\b.*\\bproduct\\b.*"

The RAW regex (remove the escaping done in string literal - this is what you get when you print out the string above):

(?s).*\bstores\b.*\bstore\b.*\bproduct\b.*

The \b checks for word boundary, so that you don't get a match for restores store products. Note that stores 3store_product is also rejected, since digit and _ are considered part of a word, but I doubt this case appear in natural text.

Since word boundary is checked for both sides, the regex above will search for exact words. In other words, stores stores product will not match the regex above, since you are searching for the word store without s.

. normally match any character except a number of new line characters. (?s) at the beginning makes . matches any character without exception (thanks to Tim Pietzcker for pointing this out).

Can I use regular expression in contains method

I want to use regular expression in contains method.

How can I use it

you can not use regex in contains method

Using Java Regex, how to check if a string contains any of the words in a set ?

TL;DR For simple substrings contains() is best but for only matching whole words Regular Expression are probably better.

The best way to see which method is more efficient is to test it.

You can use String.contains() instead of String.indexOf() to simplify your non-regexp code.

To search for different words the Regular Expression looks like this:

apple|orange|pear|banana|kiwi

The | works as an OR in Regular Expressions.

My very simple test code looks like this:

public class TestContains {

private static String containsWord(Set<String> words,String sentence) {
for (String word : words) {
if (sentence.contains(word)) {
return word;
}
}

return null;
}

private static String matchesPattern(Pattern p,String sentence) {
Matcher m = p.matcher(sentence);

if (m.find()) {
return m.group();
}

return null;
}

public static void main(String[] args) {
Set<String> words = new HashSet<String>();
words.add("apple");
words.add("orange");
words.add("pear");
words.add("banana");
words.add("kiwi");

Pattern p = Pattern.compile("apple|orange|pear|banana|kiwi");

String noMatch = "The quick brown fox jumps over the lazy dog.";
String startMatch = "An apple is nice";
String endMatch = "This is a longer sentence with the match for our fruit at the end: kiwi";

long start = System.currentTimeMillis();
int iterations = 10000000;

for (int i = 0; i < iterations; i++) {
containsWord(words, noMatch);
containsWord(words, startMatch);
containsWord(words, endMatch);
}

System.out.println("Contains took " + (System.currentTimeMillis() - start) + "ms");
start = System.currentTimeMillis();

for (int i = 0; i < iterations; i++) {
matchesPattern(p,noMatch);
matchesPattern(p,startMatch);
matchesPattern(p,endMatch);
}

System.out.println("Regular Expression took " + (System.currentTimeMillis() - start) + "ms");
}
}

The results I got were as follows:

Contains took 5962ms
Regular Expression took 63475ms

Obviously timings will vary depending on the number of words being searched for and the Strings being searched, but contains() does seem to be ~10 times faster than regular expressions for a simple search like this.

By using Regular Expressions to search for Strings inside another String you're using a sledgehammer to crack a nut so I guess we shouldn't be surprised that it's slower. Save Regular Expressions for when the patterns you want to find are more complex.

One case where you may want to use Regular Expressions is if indexOf() and contains() won't do the job because you only want to match whole words and not just substrings, e.g. you want to match pear but not spears. Regular Expressions handle this case well as they have the concept of word boundaries.

In this case we'd change our pattern to:

\b(apple|orange|pear|banana|kiwi)\b

The \b says to only match the beginning or end of a word and the brackets group the OR expressions together.

Note, when defining this pattern in your code you need to escape the backslashes with another backslash:

 Pattern p = Pattern.compile("\\b(apple|orange|pear|banana|kiwi)\\b");

How to check if a string contains only digits in Java

Try

String regex = "[0-9]+";

or

String regex = "\\d+";

As per Java regular expressions, the + means "one or more times" and \d means "a digit".

Note: the "double backslash" is an escape sequence to get a single backslash - therefore, \\d in a java String gives you the actual result: \d

References:

  • Java Regular Expressions

  • Java Character Escape Sequences


Edit: due to some confusion in other answers, I am writing a test case and will explain some more things in detail.

Firstly, if you are in doubt about the correctness of this solution (or others), please run this test case:

String regex = "\\d+";

// positive test cases, should all be "true"
System.out.println("1".matches(regex));
System.out.println("12345".matches(regex));
System.out.println("123456789".matches(regex));

// negative test cases, should all be "false"
System.out.println("".matches(regex));
System.out.println("foo".matches(regex));
System.out.println("aa123bb".matches(regex));

Question 1:

Isn't it necessary to add ^ and $ to the regex, so it won't match "aa123bb" ?

No. In java, the matches method (which was specified in the question) matches a complete string, not fragments. In other words, it is not necessary to use ^\\d+$ (even though it is also correct). Please see the last negative test case.

Please note that if you use an online "regex checker" then this may behave differently. To match fragments of a string in Java, you can use the find method instead, described in detail here:

Difference between matches() and find() in Java Regex

Question 2:

Won't this regex also match the empty string, "" ?*

No. A regex \\d* would match the empty string, but \\d+ does not. The star * means zero or more, whereas the plus + means one or more. Please see the first negative test case.

Question 3

Isn't it faster to compile a regex Pattern?

Yes. It is indeed faster to compile a regex Pattern once, rather than on every invocation of matches, and so if performance implications are important then a Pattern can be compiled and used like this:

Pattern pattern = Pattern.compile(regex);
System.out.println(pattern.matcher("1").matches());
System.out.println(pattern.matcher("12345").matches());
System.out.println(pattern.matcher("123456789").matches());

java regular expression for String.contains

You can use the LITERAL flag when compiling your pattern to tell the engine you're using a literal string, e.g.:

 Pattern p = Pattern.compile(yourString, Pattern.LITERAL);

But are you really sure that doing that and then reusing the result is faster than just String#contains? Enough to make the complexity worth it?

How to check that a string contains characters other than those specified. (in Java)

To look for characters that are NOT a, b, or c, use something like the following:

if(!s.matches("[abc]+"))
{
System.out.println("The string you entered has some incorrect characters");
}

Using Java Regex, how to check if a String contains any | but not ||

To match 'a' or 'aaa' but not 'aa' you need a regex with a negative look-ahead; e.g.

    ((?<!a)a(?!a))|(a{3,})

That says "find either an 'a' that is not preceded by an 'a' and not followed by an 'a', or a sequence of 3 or more 'a'`".

However, find with the above regex and this string "a bb aa" will give a hit. If you want to check that the string contains some 'a's and no 'aa', you will need to test the two conditions separately.

To match '|' characters instead of 'a' characters, replace 'a' with '\|' in the above.

String#contains using Pattern

If you need to write a .contains like method based on Pattern, you should choose the Matcher#find() version:

Pattern.compile(Pattern.quote(s)).matcher(input).find()

If you want to use .matches(), you should bear in mind that:

  • .* will not match line breaks by default and you need (?s) inline modifier at the start of the pattern or use Pattern.DOTALL option
  • The .* at the pattern start will cause too much backtracking and you may get a stack overflow exception, or the code execution might just freeze.


Related Topics



Leave a reply



Submit