How to Count the Number of Matches for a Regex

Count number of matches of a regex in Javascript

tl;dr: Generic Pattern Counter

// THIS IS WHAT YOU NEED
const count = (str) => {
const re = /YOUR_PATTERN_HERE/g
return ((str || '').match(re) || []).length
}

For those that arrived here looking for a generic way to count the number of occurrences of a regex pattern in a string, and don't want it to fail if there are zero occurrences, this code is what you need. Here's a demonstration:

/* *  Example */
const count = (str) => { const re = /[a-z]{3}/g return ((str || '').match(re) || []).length}
const str1 = 'abc, def, ghi'const str2 = 'ABC, DEF, GHI'
console.log(`'${str1}' has ${count(str1)} occurrences of pattern '/[a-z]{3}/g'`)console.log(`'${str2}' has ${count(str2)} occurrences of pattern '/[a-z]{3}/g'`)

How do I count the number of matches by a regex?

Regex.Matches(text, pattern).Count

Count number of matches using regex.h in C

You've understood the meaning of pmatch incorrectly. It is not used for getting repeated pattern matches. It is used to get the location of the one match and its possible subgroups. As Linux manual for regcomp(3) says:

The offsets of the subexpression starting at the ith open
parenthesis are stored in pmatch[i]. The entire regular expression's match addresses are stored in
pmatch[0]. (Note that to return the offsets of N subexpression matches, nmatch must be at least N+1.)
Any unused structure elements will contain the value -1.

If you have the regular expression this (\w+) costs (\d+) USD, there are 2 capturing groups in parentheses (\w+) and (\d+); now if nmatch was set to at least 3, pmatch[0] would contain the start and end indices of the whole match, pmatch[1] start and end for the (\w+) group and pmatch[2] for the (\d+) group.


The following code should print the ranges of consecutive matches, if any, or the string "<the input string>" does not contain a match if the pattern never matches.

It is carefully constructed so that it works for a zero-length regular expression as well (an empty regular expression, or say regular expression #? will match at each character position including after the last character; 28 matches of that regular expression would be reported for input the cat is in the bathroom.)

#include <sys/types.h>
#include <regex.h>
#include <stdio.h>
#include <string.h>

void match(regex_t *pexp, char *sz) {
// we just need the whole string match in this example
regmatch_t whole_match;

// we store the eflags in a variable, so that we can make
// ^ match the first time, but not for subsequent regexecs
int eflags = 0;
int match = 0;
size_t offset = 0;
size_t length = strlen(sz);

while (regexec(pexp, sz + offset, 1, &whole_match, eflags) == 0) {
// do not let ^ match again.
eflags = REG_NOTBOL;
match = 1;
printf("range %zd - %zd matches\n",
offset + whole_match.rm_so,
offset + whole_match.rm_eo);

// increase the starting offset
offset += whole_match.rm_eo;

// a match can be a zero-length match, we must not fail
// to advance the pointer, or we'd have an infinite loop!
if (whole_match.rm_so == whole_match.rm_eo) {
offset += 1;
}

// break the loop if we've consumed all characters. Note
// that we run once for terminating null, to let
// a zero-length match occur at the end of the string.
if (offset > length) {
break;
}
}
if (! match) {
printf("\"%s\" does not contain a match\n", sz);
}
}

int main(int argc, char* argv[]) {
int rv;
regex_t exp;
rv = regcomp(&exp, "(the)", REG_EXTENDED | REG_ICASE);
if (rv != 0) {
printf("regcomp failed\n");
}
match(&exp, "the cat is in the bathroom.");
regfree(&exp);
return 0;
}

P.S., the parentheses in your regex (the) are unnecessary in this case; you could just write the (and your initial confusion of getting 2 matches at same position was because you'd get one match for (the) and one submatch for the, had you not have had these parentheses, your code would have printed the location of first match only once).

Number of regex matches

If you know you will want all the matches, you could use the re.findall function. It will return a list of all the matches. Then you can just do len(result) for the number of matches.

Count number of times a regular expression is matched in a string

For this particular scenario, you could do something like this:

Regex.Match(input, pattern).Groups[1].Captures.Count

The element in Groups[0] would be the entire match, so that's not helpful for what you need. Groups[1] will contain the entire (\\tChannel [0-9]* \(mV\))* section, which includes all the repeats. To get the number of times it repeates you use .Captures.Count

Sample based on your example:

Regex.Match(
@"Time\tChannel 1 (mV)\tChannel 2 (mV)\tChannel 3 (mV)\tChannel 4 (mV)\tChannel 5 (mV)\tChannel 6 (mV)\tChannel 7 (mV)\tChannel 8 (mV)\tChannel 1_cal (mg/L)\tChannel 2_cal ()\tChannel 3_cal ()\tChannel 4_cal ()\tChannel 5_cal ()\tChannel 6_cal ()\tChannel 7_cal ()\tChannel 8_cal ()\tMotor 1 (mm)\tMotor 2 (mm)",
@"Time(\\tChannel [0-9]* \(mV\))*"
).Groups[1].Captures.Count;

I apologize for the poor formatting there, but this should show you how this can be done at the very least.

The examples given around Regex.Matches(...).Count won't work here because it's a single match. You can't just use Regex.Match(...).Groups.Count either because you only have one group specified, which leaves this with 2 groups returned from the match. You need to look at your specific group Regex.Match(...).Groups[1] and get the count from the number of captures in that group.

Also, you can name the groups which might make it a little bit clearer on what is happening. Here's an example:

Regex.Match(
@"Time\tChannel 1 (mV)\tChannel 2 (mV)\tChannel 3 (mV)\tChannel 4 (mV)\tChannel 5 (mV)\tChannel 6 (mV)\tChannel 7 (mV)\tChannel 8 (mV)\tChannel 1_cal (mg/L)\tChannel 2_cal ()\tChannel 3_cal ()\tChannel 4_cal ()\tChannel 5_cal ()\tChannel 6_cal ()\tChannel 7_cal ()\tChannel 8_cal ()\tMotor 1 (mm)\tMotor 2 (mm)",
@"Time(?<channelGroup>\\tChannel [0-9]* \(mV\))*"
).Groups["channelGroup"].Captures.Count;

How to count the number of matches captured by the regex in ASP.Net MVC?

MatchCollection matches = pathRegex.Matches(url);
var count = matches.Count;

Regex Class

How can I count the number of matches for a regex?

matcher.find() does not find all matches, only the next match.

Solution for Java 9+

long matches = matcher.results().count();

Solution for Java 8 and older

You'll have to do the following. (Starting from Java 9, there is a nicer solution)

int count = 0;
while (matcher.find())
count++;

Btw, matcher.groupCount() is something completely different.

Complete example:

import java.util.regex.*;

class Test {
public static void main(String[] args) {
String hello = "HelloxxxHelloxxxHello";
Pattern pattern = Pattern.compile("Hello");
Matcher matcher = pattern.matcher(hello);

int count = 0;
while (matcher.find())
count++;

System.out.println(count); // prints 3
}
}

Handling overlapping matches

When counting matches of aa in aaaa the above snippet will give you 2.

aaaa
aa
aa

To get 3 matches, i.e. this behavior:

aaaa
aa
aa
aa

You have to search for a match at index <start of last match> + 1 as follows:

String hello = "aaaa";
Pattern pattern = Pattern.compile("aa");
Matcher matcher = pattern.matcher(hello);

int count = 0;
int i = 0;
while (matcher.find(i)) {
count++;
i = matcher.start() + 1;
}

System.out.println(count); // prints 3

Regex match count of characters that are separated by non-matching characters

Hey I think this would a simple but working one:

( *?[0-9a-zA-Z] *?){10,}

Breaking the regex down:

  1. ( *? --------It can start with space(s)
  2. [0-9a-zA-Z] -Followed with the alphanumeric values
  3. *?) ---------It can end with space(s)
  4. {10,} -------Matches this pattern 10 or more times

Key: When I look at the count for regexes, it applies to the group, i.e., the things in the brackets "()", this case, multiple spaces followed ONE from the alphanumeric values followed by spaces are still counted as one match. Hope it helps. :)

Count number of character matches in a string (Regex only)?

Use separate look aheads for each assertion:

^(?=(([^ac]*[ac]){2})*[^ac]*$)(?=(([^bd]*[bd]){2})*[^bd]*$).*$

See live demo.

This works basically because ([^ac]*[ac]){2}) matches pairs of [ac]. The rest is relatively simple.

How to count the number of matches for a given string in MySQL?

I think you need something like this:

SELECT
col,
CASE WHEN COALESCE(col, '')='' THEN 0
ELSE
length(col)-length(replace(col, ',', ''))+
(length(col)-length(replace(col, ' and ', ''))) DIV 5
+1
END
FROM
yourtable

Please see fiddle here.



Related Topics



Leave a reply



Submit