Postgresql Count Number of Times Substring Occurs in Text

Counting the number of occurrences of a substring within a string in PostgreSQL

A common solution is based on this logic: replace the search string with an empty string and divide the difference between old and new length by the length of the search string

(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'substring', ''))) 
/ CHAR_LENGTH('substring')

Hence:

UPDATE test."user"
SET result =
(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'o', '')))
/ CHAR_LENGTH('o');

PostgreSQL count number of times substring occurs in text

I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace(). The overhead of creating the rows, and the running the aggregate is just simply too high.

The fastest way to do this is as follows...

SELECT
(length(str) - length(replace(str, replacestr, '')) )::int
/ length(replacestr)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr);

Here we

  1. Take the length of the string, L1
  2. Subtract from L1 the length of the string with all of the replacements removed L2 to get L3 the difference in string length.
  3. Divide L3 by the length of the replacement to get the occurrences

For comparison that's about five times faster than the method of using regexp_matches() which looks like this.

SELECT count(*)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');

PostgreSQL SQL query to find number of occurrences of substring in string

This is easily done without a custom function:

select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
regexp_split_to_table(v.str, ' ') word join
patterns p
on word = p.pattern

Just break the original string into "words". Then match on the words.

Another method uses regular expression matching:

select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
(select string_agg(pattern, '|') as rpattern
from patterns
) p;

This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.

Here is a db<>fiddle.

I want to count the number of occurences of a value in a string

I solved it myself. Thank you for all the ideas!

SELECT count(something)
FROM unnest(
string_to_array(
'1,2,3,3,4,5,6,3'
, ',')
) something
WHERE something = '3'

How do you count the number of occurrences of a certain substring in a SQL varchar?

The first way that comes to mind is to do it indirectly by replacing the comma with an empty string and comparing the lengths

Declare @string varchar(1000)
Set @string = 'a,b,c,d'
select len(@string) - len(replace(@string, ',', ''))

Count each next occurence of string in substring

How to do this is described in the gsubfn vignette. Using the code there first we define a proto object pword with methods pre and fun. pre initializes the word list (which stores the current count for each word encountered) and fun updates it each time a new word is encountered and also suffixes the word with the count returning the suffixed word.

Having defined the foregoing, run gsubfn using pword. For each component of the input gsubfn will first run pre and then for each match of the regular expression \\w+ gsubfn will input the match to fun, run fun and replace the match with the output of fun.

We have assumed that the words to be suffixed with a count are matched by \w+ which is the case for the example in the question but if your actual data is different you may need to change the pattern.

library(gsubfn)
s <- rep("A > B > B > C > B > A > C > B > A", 3) # sample input

pwords <- proto(
pre = function(this) { this$words <- list() },
fun = function(this, x) {
if (is.null(words[[x]])) this$words[[x]] <- 0
this$words[[x]] <- this$words[[x]] + 1
paste0(x, words[[x]])
}
)

gsubfn("\\w+", pwords, s)

giving:

[1] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"
[2] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"
[3] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"

Number of times a particular character appears in a string

There's no direct function for this, but you can do it with a replace:

declare @myvar varchar(20)
set @myvar = 'Hello World'

select len(@myvar) - len(replace(@myvar,'o',''))

Basically this tells you how many chars were removed, and therefore how many instances of it there were.

Extra:

The above can be extended to count the occurences of a multi-char string by dividing by the length of the string being searched for. For example:

declare @myvar varchar(max), @tocount varchar(20)
set @myvar = 'Hello World, Hello World'
set @tocount = 'lo'

select (len(@myvar) - len(replace(@myvar,@tocount,''))) / LEN(@tocount)


Related Topics



Leave a reply



Submit