Counting the number of occurrences of a substring within a string in PostgreSQL
A common solution is based on this logic: replace the search string with an empty string and divide the difference between old and new length by the length of the search string
(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'substring', '')))
/ CHAR_LENGTH('substring')
Hence:
UPDATE test."user"
SET result =
(CHAR_LENGTH(name) - CHAR_LENGTH(REPLACE(name, 'o', '')))
/ CHAR_LENGTH('o');
PostgreSQL count number of times substring occurs in text
I would highly suggest checking out this answer I posted to "How do you count the occurrences of an anchored string using PostgreSQL?". The chosen answer was shown to be massively slower than an adapted version of regexp_replace()
. The overhead of creating the rows, and the running the aggregate is just simply too high.
The fastest way to do this is as follows...
SELECT
(length(str) - length(replace(str, replacestr, '')) )::int
/ length(replacestr)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr);
Here we
- Take the length of the string,
L1
- Subtract from
L1
the length of the string with all of the replacements removedL2
to getL3
the difference in string length. - Divide
L3
by the length of the replacement to get the occurrences
For comparison that's about five times faster than the method of using regexp_matches()
which looks like this.
SELECT count(*)
FROM ( VALUES
('foobarbaz', 'ba')
) AS t(str, replacestr)
CROSS JOIN LATERAL regexp_matches(str, replacestr, 'g');
PostgreSQL SQL query to find number of occurrences of substring in string
This is easily done without a custom function:
select count(*)
from (values ('Earth is my home planet and where my friends live')) v(str) cross join lateral
regexp_split_to_table(v.str, ' ') word join
patterns p
on word = p.pattern
Just break the original string into "words". Then match on the words.
Another method uses regular expression matching:
select (select count(*) from regexp_matches(v.str, p.rpattern, 'g'))
from (values ('Earth is my home planet and where my friends live')) v(str) cross join
(select string_agg(pattern, '|') as rpattern
from patterns
) p;
This stuffs all the patterns into a regular expression. Not that this version does not take word breaks into account.
Here is a db<>fiddle.
I want to count the number of occurences of a value in a string
I solved it myself. Thank you for all the ideas!
SELECT count(something)
FROM unnest(
string_to_array(
'1,2,3,3,4,5,6,3'
, ',')
) something
WHERE something = '3'
How do you count the number of occurrences of a certain substring in a SQL varchar?
The first way that comes to mind is to do it indirectly by replacing the comma with an empty string and comparing the lengths
Declare @string varchar(1000)
Set @string = 'a,b,c,d'
select len(@string) - len(replace(@string, ',', ''))
Count each next occurence of string in substring
How to do this is described in the gsubfn vignette. Using the code there first we define a proto object pword
with methods pre
and fun
. pre
initializes the word list (which stores the current count for each word encountered) and fun
updates it each time a new word is encountered and also suffixes the word with the count returning the suffixed word.
Having defined the foregoing, run gsubfn
using pword
. For each component of the input gsubfn
will first run pre
and then for each match of the regular expression \\w+
gsubfn
will input the match to fun
, run fun
and replace the match with the output of fun
.
We have assumed that the words to be suffixed with a count are matched by \w+
which is the case for the example in the question but if your actual data is different you may need to change the pattern.
library(gsubfn)
s <- rep("A > B > B > C > B > A > C > B > A", 3) # sample input
pwords <- proto(
pre = function(this) { this$words <- list() },
fun = function(this, x) {
if (is.null(words[[x]])) this$words[[x]] <- 0
this$words[[x]] <- this$words[[x]] + 1
paste0(x, words[[x]])
}
)
gsubfn("\\w+", pwords, s)
giving:
[1] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"
[2] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"
[3] "A1 > B1 > B2 > C1 > B3 > A2 > C2 > B4 > A3"
Number of times a particular character appears in a string
There's no direct function for this, but you can do it with a replace:
declare @myvar varchar(20)
set @myvar = 'Hello World'
select len(@myvar) - len(replace(@myvar,'o',''))
Basically this tells you how many chars were removed, and therefore how many instances of it there were.
Extra:
The above can be extended to count the occurences of a multi-char string by dividing by the length of the string being searched for. For example:
declare @myvar varchar(max), @tocount varchar(20)
set @myvar = 'Hello World, Hello World'
set @tocount = 'lo'
select (len(@myvar) - len(replace(@myvar,@tocount,''))) / LEN(@tocount)
Related Topics
Best Practices for Inserting/Updating Large Amount of Data in SQL Server 2008
Should a Composite Primary Key Be Clustered in SQL Server
Spark Dataframe Nested Case When Statement
Why Is Union Faster Than an or Statement
How to Find the User That Has Both a Cat and a Dog
How to Subtract 2 Dates in Oracle to Get the Result in Hour and Minute
SQL Recursion Without Recursion
Insert into a Row at Specific Position into SQL Server Table with Pk
How to Create Xml Schema from an Existing Database in SQL Server 2008
Interesting Tree/Hierarchical Data Structure Problem
Row_Number() Over Not Fast Enough with Large Result Set, Any Good Solution
How to Use SQL Wildcards in Linq to Entity Framework
How to Determine the Primary Key for a Table in SQL Server
To Calculate Sum() Two Alias Named Columns - in SQL
SQL Query - Sum(Case When X Then 1 Else 0) for Multiple Columns
Preserve SQL Indexes While Altering Column Datatype
Use a Like Clause in Part of an Inner Join
Go with SQL Server Driver Is Unable to Connect Successfully, Login Fail