Regex:How to Get Words from a String (C#)

Regex : how to get words from a string (C#)

Simple Regex:

\w+

This matches a string of "word" characters. That is almost what you want.

This is slightly more accurate:

\w(?<!\d)[\w'-]*

It matches any number of word characters, ensuring that the first character was not a digit.

Here are my matches:

1 LOLOLOL

2 YOU'VE

3 BEEN

4 PWN3D

5 einszwei

6 drei

Now, that's more like it.

EDIT:

The reason for the negative look-behind, is that some regex flavors support Unicode characters. Using [a-zA-Z] would miss quite a few "word" characters that are desirable. Allowing \w and disallowing \d includes all Unicode characters that would conceivably start a word in any block of text.

EDIT 2:

I have found a more concise way to get the effect of the negative lookbehind: Double negative character class with a single negative exclusion.

[^\W\d][\w'-]*(?<=\w)

This is the same as the above with the exception that it also ensures that the word ends with a word character. And, finally, there is:

[^\W\d](\w|[-']{1,2}(?=\w))*

Ensuring that there are no more than two non-word-characters in a row. Aka, It matches "word-up" but not "word--up", which makes sense. If you want it to match "word--up", but not "word---up", you can change the 2 to a 3.

C# extract words using regex

In the general case, you can do this using capturing parentheses:

string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
string regex = @"\.\.\.\. (\d+) ::::: (\w+)";
Match m = Regex.Match(input, regex);

if (m.Success) {
int numberAfterDots = int.Parse(m.Groups[1].Value);
string wordAfterColons = m.Groups[2].Value;
// ... Do something with these values
}

But the first part you asked (extract all the numbers) is a bit easier:

string input = "aaaa 1234 ...... 1234 ::::: asgsgd";
var numbers = Regex.Matches(input, @"\d+")
.Cast<Match>()
.Select(m => int.Parse(m.Value))
.ToList();

Now numbers will be a list of integers.

C# RegEx string extraction

This will get each of the values into separate ints for you:

string text = "ImageDimension=655x0;ThumbnailDimension=0x0";
Regex pattern = new Regex(@"ImageDimension=(?<imageWidth>\d+)x(?<imageHeight>\d+);ThumbnailDimension=(?<thumbWidth>\d+)x(?<thumbHeight>\d+)");
Match match = pattern.Match(text);
int imageWidth = int.Parse(match.Groups["imageWidth"].Value);
int imageHeight = int.Parse(match.Groups["imageHeight"].Value);
int thumbWidth = int.Parse(match.Groups["thumbWidth"].Value);
int thumbHeight = int.Parse(match.Groups["thumbHeight"].Value);

C# RegEx to find a specific string or all words in a string

Just remove the second \ in the first @"\b\":

var pattern = @"\b" + searchString + @"\b";
^

See IDEONE demo

Note that in case you have special regex metacharacters (like (, ), [, +, *, etc.) in your searchStrings, you can use Regex.Escape() to escape them:

var pattern = @"\b" + Regex.Escape(searchString) + @"\b";

And if those characters may appear in edge positions, use lookarounds rather than word boundaries:

var pattern = @"(?<!\w)" + searchString + @"(?=\w)";

C# How to get Words From String

Try this one. It will split your string with all non-alphanumeric characters.

string s = "A~B~C~D";
string[] strings = Regex.Split(s, @"\W|_");

Extract a word from string using regex

You can use the following regex.

Match m = Regex.Match(input, @"\b(?i:com\d+)");
if (m.Success)
Console.WriteLine(m.Value); //=> "COM10"

Explanation:

\b       # the boundary between a word character (\w) and not a word character
(?i: # group, but do not capture (case-insensitive)
com # 'com'
\d+ # digits (0-9) (1 or more times)
) # end of grouping

Working Demo

C# regex pattern to getting words

To get all words that are at least 2 characters long you can use this pattern: \b[a-zA-Z]{2,}\b.

string text = "HI/how.are.3.a.d.you.&/{}today 2z3";
var matches = Regex.Matches(text, @"\b[a-zA-Z]{2,}\b");
string result = String.Join(" ", matches.Cast<Match>().Select(m => m.Value));
Console.WriteLine(result);

As others have pointed out in the comments, "A" and "I" are valid words. In case you decide to match those you can use this pattern instead:

var matches = Regex.Matches(text, @"\b(?:[a-z]{2,}|[ai])\b",
RegexOptions.IgnoreCase);

In both patterns I've used \b to match word-boundaries. If you have input such as "1abc2" then "abc" wouldn't be matched. If you want it to be matched then remove the \b metacharacters. Doing so from the first pattern is straightforward. The second pattern would change to [a-z]{2,}|[ai].

How to find a word directly after another word in a string C#

Use

var word = "someword";
var regex = new Regex(string.Format(@"(?<!\w){0}\W+(\w+)", Regex.Escape(word)));
var match = regex.Match(text);
if (match.Success)
{
Console.WriteLine(match.Groups[1].Value);
}

Regex.Escape(word) is in case word contains +, [, ( or other special characters. (?<!\w) is better than \b, as it will match correctly even if word starts with special character. \W+ is better than \s+ because it matches any non-word characters between two words.

See regex proof.

Explanation

--------------------------------------------------------------------------------
(?<! look behind to see if there is not:
--------------------------------------------------------------------------------
\w word characters (a-z, A-Z, 0-9, _)
--------------------------------------------------------------------------------
) end of look-behind
--------------------------------------------------------------------------------
Message 'Message'
--------------------------------------------------------------------------------
\W+ non-word characters (all but a-z, A-Z, 0-
9, _) (1 or more times (matching the most
amount possible))
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
\w+ word characters (a-z, A-Z, 0-9, _) (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \1

how can i extract a specific part of a string in c# with regex.match?

The part you need should be in the capture group (the part you put between ()). Try accessing it with

Match1.Groups[1].Value

c# Regex of value after certain words

You may use a regex to capture the necessary details in the input string:

var pattern = @"Slot:(\d+)\s*Module:(.+)";
foreach (string config in backplaneConfig)
{
var values = Regex.Match(config, pattern);
if (values.Success)
{
modulesInfo.Add(new ModuleIdentyfication { ModuleSlot = Convert.ToInt32(values.Groups[1].Value), ModuleType = values.Groups[2].Value });
}
}

See the regex demo. Group 1 is the ModuleSlot and Group 2 is the ModuleType.

Details

  • Slot: - literal text
  • (\d+) - Capturing group 1: one or more digits
  • \s* - 0+ whitespaces
  • Module: - literal text
  • (.+) - Capturing group 2: the rest of the string to the end.


Related Topics



Leave a reply



Submit