How to Find a Whole Word in a String in Java

How to find a whole word in a String in Java?

The example below is based on your comments. It uses a List of keywords, which will be searched in a given String using word boundaries. It uses StringUtils from Apache Commons Lang to build the regular expression and print the matched groups.

String text = "I will come and meet you at the woods 123woods and all the woods";

List<String> tokens = new ArrayList<String>();
tokens.add("123woods");
tokens.add("woods");

String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
System.out.println(matcher.group(1));
}

If you are looking for more performance, you could have a look at StringSearch: high-performance pattern matching algorithms in Java.

Finding the first matching whole word given a substring in a long text in Java

What could be wrong with my code?

because your regex is matching only overflow not the word that contains it

Use the following regex instead :

\\b\\S*overflow\\S*


String token = "\\b\\S*overflow\\S*";
Pattern pattern = Pattern.compile(token);
Matcher matcher = pattern.matcher(fullText);
if (matcher.find())
{
System.out.println("Whole word is :"+matcher.group());
}

explanation:

  • \b matches word boundary

  • \\S* matches zero or more none space character

  • overflow mataches overflow literally

  • \\S* matches zero or more non space characters


Alternative two: using split and iterate through each word and break when find the word

String fullText="Stackoverflow is the best and stackoverflow.com rocks !!!";
String [] strWords = fullText.split("\\s");
for(String strWord:strWords){
if(strWord.contains("overflow")){
System.out.println(strWord);
break;
}
}

Java match whole word in String

Since word boundary does not match between a word char and underscore you need

String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";

Here, (?<=_|\b) positive lookbehind requires a word boundary or an underscore to appear before the str, and the (?=_|\b) positive lookahead requires an underscore or a word boundary to appear right after the str.

See this regex demo.

If your word may have special chars inside, you might want to use a more straight-forward word boundary:

"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"

Here, the negative lookbehind (?<![^\\W_]) fails the match if there is a word character except an underscore ([^...] is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W and a _), and the (?![^\W_]) negative lookahead fails the match if there is a word char except the underscore after the str.

Note that the second example has a quoted search string, so that even AA.A_str.txt could be matched well with AA.A.

See another regex demo

How to find a whole word in java?

Basically, you want to find what the user entered and want to make sure that it won't match part of a word?

In that case, stay far away from \b and its cousin \w, as those are utterly useless for anything you would consider a word (it's a rough approximation what some programming languages thing as identifiers, nothing more). Best spell out explicitly what you want:

(?<=^|\s)search:(?=\s|$)

which means that preceding and following your search term is either whitespace or the beginning/end of the string. You may want to alter the lookahead to something like

(?=[\s.,:;'"!?)]|$)

maybe to allow for punctuation (and likewise, at least the opening parenthesis, in the lookbehind).

Java Regex : match whole word with word boundary

It appears you only want to match "words" enclosed with whitespace (or at the start/end of strings).

Use

String pattern = "(?<!\\S)" + Pattern.quote(word) + "(?!\\S)";

The (?<!\S) negative lookbehind will fail all matches that are immediately preceded with a char other than a whitespace and (?!\s) is a negative lookahead that will fail all matches that are immediately followed with a char other than whitespace. Pattern.quote() is necessary to escape special chars that need to be treated as literal chars in the regex pattern.

Find the whole word from a Sentence with matching String

Do it with regex: Something like

about pro.*?\b

Will match about pro and then some characters and then a word boundary (a whitespace or punctuation mark). This way you don't have to make multiple substrings (which is a costly operation).



Related Topics



Leave a reply



Submit