How to Count the Number of Words in a String in Oracle

How can I count the number of words in a string in Oracle?

You can use something similar to this. This gets the length of the string, then substracts the length of the string with the spaces removed. By then adding the number one to that should give you the number of words:

Select length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable

See SQL Fiddle with Demo

If you use the following data:

CREATE TABLE yourtable
(yourCol varchar2(15))
;

INSERT ALL
INTO yourtable (yourCol)
VALUES ('Hello To Oracle')
INTO yourtable (yourCol)
VALUES ('oneword')
INTO yourtable (yourCol)
VALUES ('two words')
SELECT * FROM dual
;

And the query:

Select yourcol,
length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable

The result is:

|         YOURCOL | NUMBOFWORDS |
---------------------------------
| Hello To Oracle | 3 |
| oneword | 1 |
| two words | 2 |

How to count number of words in delimited string in Oracle SQL

Firstly, it is a bad design to store multiple values in a single column as delimited string. You should consider normalizing the data as a permanent solution.

With the denormalized data, you could do it in a single SQL using REGEXP_SUBSTR:

SELECT COUNT(DISTINCT(regexp_substr(country, '[^ ]+', 1, LEVEL))) as "COUNT"
FROM table_name
CONNECT BY LEVEL <= regexp_count(country, ' ')+1
/

Demo:

SQL> WITH sample_data AS
2 ( SELECT 'japan singapore japan chinese chinese chinese' str FROM dual
3 )
4 -- end of sample_data mocking real table
5 SELECT COUNT(DISTINCT(regexp_substr(str, '[^ ]+', 1, LEVEL))) as "COUNT"
6 FROM sample_data
7 CONNECT BY LEVEL <= regexp_count(str, ' ')+1
8 /

COUNT
----------
3

See Split single comma delimited string into rows in Oracle to understand how the query works.


UPDATE

For multiple delimited string rows you need to take care of the number of rows formed by the CONNECT BY clause.

See Split comma delimited strings in a table in Oracle for more ways of doing the same task.

Setup

Let's say you have a table with 3 rows like this:

SQL> CREATE TABLE t(country VARCHAR2(200));

Table created.

SQL> INSERT INTO t VALUES('japan singapore japan chinese chinese chinese');

1 row created.

SQL> INSERT INTO t VALUES('singapore indian malaysia');

1 row created.

SQL> INSERT INTO t VALUES('french french french');

1 row created.

SQL> COMMIT;

Commit complete.

SQL> SELECT * FROM t;

COUNTRY
---------------------------------------------------------------------------
japan singapore japan chinese chinese chinese
singapore indian malaysia
french french french
  • Using REGEXP_SUBSTR and REGEXP_COUNT:

We expect the output as 6 since there are 6 unique strings.

SQL> SELECT COUNT(DISTINCT(regexp_substr(t.country, '[^ ]+', 1, lines.column_value))) count
2 FROM t,
3 TABLE (CAST (MULTISET
4 (SELECT LEVEL FROM dual
5 CONNECT BY LEVEL <= regexp_count(t.country, ' ')+1
6 ) AS sys.odciNumberList ) ) lines
7 ORDER BY lines.column_value
8 /

COUNT
----------
6

There are many other methods to achieve the desired output. Let's see how:

  • Using XMLTABLE

SQL> SELECT COUNT(DISTINCT(country)) COUNT
2 FROM
3 (SELECT trim(COLUMN_VALUE) country
4 FROM t,
5 xmltable(('"'
6 || REPLACE(country, ' ', '","')
7 || '"'))
8 )
9 /

COUNT
----------
6
  • Using MODEL clause

SQL> WITH
2 model_param AS
3 (
4 SELECT country AS orig_str ,
5 ' '
6 || country
7 || ' ' AS mod_str ,
8 1 AS start_pos ,
9 Length(country) AS end_pos ,
10 (LENGTH(country) -
11 LENGTH(REPLACE(country, ' '))) + 1 AS element_count ,
12 0 AS element_no ,
13 ROWNUM AS rn
14 FROM t )
15 SELECT COUNT(DISTINCT(Substr(mod_str, start_pos, end_pos-start_pos))) count
16 FROM (
17 SELECT *
18 FROM model_param
19 MODEL PARTITION BY (rn, orig_str, mod_str)
20 DIMENSION BY (element_no)
21 MEASURES (start_pos, end_pos, element_count)
22 RULES ITERATE (2000)
23 UNTIL (ITERATION_NUMBER+1 = element_count[0])
24 ( start_pos[ITERATION_NUMBER+1] =
25 instr(cv(mod_str), ' ', 1, cv(element_no)) + 1,
26 end_pos[ITERATION_NUMBER+1] =
27 instr(cv(mod_str), ' ', 1, cv(element_no) + 1) )
28 )
29 WHERE element_no != 0
30 ORDER BY mod_str , element_no
31 /

COUNT
----------
6

PL / SQL word count from string

One another option would be the query below :

select word, count(1) as repeating
from
(
with t(str) as
(
select 'Hello, I like ham pizza more than mozzarella pizza' from dual
)
select regexp_replace(regexp_substr(str, '[^\ ]+', 1, level),'[^a-zA-Z]','')
as word
from t
cross join dual
connect by level <= regexp_count(str, '[^\ ]+')
)
group by word
order by repeating desc, word;

WORD REPEATING
---------- ---------
pizza 2
ham 1
Hello 1
I 1
like 1
more 1
mozzarella 1
than 1

How to count the number of occurrences of a character in an Oracle varchar value?

Here you go:

select length('123-345-566') - length(replace('123-345-566','-',null)) 
from dual;

Technically, if the string you want to check contains only the character you want to count, the above query will return NULL; the following query will give the correct answer in all cases:

select coalesce(length('123-345-566') - length(replace('123-345-566','-',null)), length('123-345-566'), 0) 
from dual;

The final 0 in coalesce catches the case where you're counting in an empty string (i.e. NULL, because length(NULL) = NULL in ORACLE).

Counting the number of words in a Column in Oracle SQL

The answer to this data set is below:

`select * from(
select x,count(*) as coun from (
select substr(names,
INSTR(names, ' ', -1, 1)+1) as x
from abc

union all

SELECT SUBSTR(names,
INSTR(names, ' ', 1, 1) + 1,
INSTR(names, ' ', 1, 2) - INSTR(names, ' ', 1, 1) - 1) as x
FROM abc

union all

SELECT SUBSTR(names,1,
INSTR(names, ' ',1 , 1)-1) as x
FROM abc
)
where x is not null and x not in ('1','2','3','4','5','6','7')
group by x
order by coun desc)
where rownum < 4800;'

Answer:

Sample Image

How do I Count the words in a string using regex

INSTR is also a viable option. By looking for the second occurrence of a space, that will indicate that the string has at least 3 words.

WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE instr(title, ' ', 1, 2) > 0;

If you do with to stick with regex, the regex expression below can be used to find books that have 3 or more words.

WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE REGEXP_LIKE (title, '(\S+\s){2,}');

(Thanks @Littlefoot for the books!)



Related Topics



Leave a reply



Submit