How to find a whole word in a String in Java?
The example below is based on your comments. It uses a List of keywords, which will be searched in a given String using word boundaries. It uses StringUtils from Apache Commons Lang to build the regular expression and print the matched groups.
String text = "I will come and meet you at the woods 123woods and all the woods";
List<String> tokens = new ArrayList<String>();
tokens.add("123woods");
tokens.add("woods");
String patternString = "\\b(" + StringUtils.join(tokens, "|") + ")\\b";
Pattern pattern = Pattern.compile(patternString);
Matcher matcher = pattern.matcher(text);
while (matcher.find()) {
System.out.println(matcher.group(1));
}
If you are looking for more performance, you could have a look at StringSearch: high-performance pattern matching algorithms in Java.
Finding the first matching whole word given a substring in a long text in Java
What could be wrong with my code?
because your regex is matching only overflow
not the word that contains it
Use the following regex instead :
\\b\\S*overflow\\S*
String token = "\\b\\S*overflow\\S*";
Pattern pattern = Pattern.compile(token);
Matcher matcher = pattern.matcher(fullText);
if (matcher.find())
{
System.out.println("Whole word is :"+matcher.group());
}
explanation:
\b
matches word boundary\\S*
matches zero or more none space characteroverflow
mataches overflow literally\\S*
matches zero or more non space characters
Alternative two: using split and iterate through each word and break when find the word
String fullText="Stackoverflow is the best and stackoverflow.com rocks !!!";
String [] strWords = fullText.split("\\s");
for(String strWord:strWords){
if(strWord.contains("overflow")){
System.out.println(strWord);
break;
}
}
Java match whole word in String
Since word boundary does not match between a word char and underscore you need
String pattern = "(?<=_|\\b)" + str + "(?=_|\\b)";
Here, (?<=_|\b)
positive lookbehind requires a word boundary or an underscore to appear before the str
, and the (?=_|\b)
positive lookahead requires an underscore or a word boundary to appear right after the str
.
See this regex demo.
If your word may have special chars inside, you might want to use a more straight-forward word boundary:
"(?<![^\\W_])" + Pattern.quote(str) + "(?![^\\W_])"
Here, the negative lookbehind (?<![^\\W_])
fails the match if there is a word character except an underscore ([^...]
is a negated character class that matches any character other than the characters, ranges, etc. defined inside this class, thus, it matches all characters other than a non-word char \W
and a _
), and the (?![^\W_])
negative lookahead fails the match if there is a word char except the underscore after the str
.
Note that the second example has a quoted search string, so that even AA.A_str.txt
could be matched well with AA.A
.
See another regex demo
How to find a whole word in java?
Basically, you want to find what the user entered and want to make sure that it won't match part of a word?
In that case, stay far away from \b
and its cousin \w
, as those are utterly useless for anything you would consider a word (it's a rough approximation what some programming languages thing as identifiers, nothing more). Best spell out explicitly what you want:
(?<=^|\s)search:(?=\s|$)
which means that preceding and following your search term is either whitespace or the beginning/end of the string. You may want to alter the lookahead to something like
(?=[\s.,:;'"!?)]|$)
maybe to allow for punctuation (and likewise, at least the opening parenthesis, in the lookbehind).
Java Regex : match whole word with word boundary
It appears you only want to match "words" enclosed with whitespace (or at the start/end of strings).
Use
String pattern = "(?<!\\S)" + Pattern.quote(word) + "(?!\\S)";
The (?<!\S)
negative lookbehind will fail all matches that are immediately preceded with a char other than a whitespace and (?!\s)
is a negative lookahead that will fail all matches that are immediately followed with a char other than whitespace. Pattern.quote()
is necessary to escape special chars that need to be treated as literal chars in the regex pattern.
Find the whole word from a Sentence with matching String
Do it with regex: Something like
about pro.*?\b
Will match about pro and then some characters and then a word boundary (a whitespace or punctuation mark). This way you don't have to make multiple substrings (which is a costly operation).
Related Topics
Get Only Part of an Array in Java
Print "Hello World" Every X Seconds
Mock a Constructor with Parameter
Java - Does Null Variable Require Space in Memory
Why Can't I Use a Type Argument in a Type Parameter with Multiple Bounds
Jpa: How to Have One-To-Many Relation of the Same Entity Type
How to Create an Asynchronous Http Request in Java
Run Single Test from a Junit Class Using Command-Line
How to Simulate a Buffered Peripheral Device with Swingworker
Why Does Parallel Stream with Lambda in Static Initializer Cause a Deadlock
What Is the $1 in Class File Names
How to Convert Hex String to Java String
Disable a Particular Checkstyle Rule for a Particular Line of Code
Retrieving the Inherited Attribute Names/Values Using Java Reflection