How can I count the number of words in a string in Oracle?
You can use something similar to this. This gets the length of the string, then substracts the length of the string with the spaces removed. By then adding the number one to that should give you the number of words:
Select length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
See SQL Fiddle with Demo
If you use the following data:
CREATE TABLE yourtable
(yourCol varchar2(15))
;
INSERT ALL
INTO yourtable (yourCol)
VALUES ('Hello To Oracle')
INTO yourtable (yourCol)
VALUES ('oneword')
INTO yourtable (yourCol)
VALUES ('two words')
SELECT * FROM dual
;
And the query:
Select yourcol,
length(yourCol) - length(replace(yourcol, ' ', '')) + 1 NumbofWords
from yourtable
The result is:
| YOURCOL | NUMBOFWORDS |
---------------------------------
| Hello To Oracle | 3 |
| oneword | 1 |
| two words | 2 |
How to count number of words in delimited string in Oracle SQL
Firstly, it is a bad design to store multiple values in a single column as delimited string. You should consider normalizing the data as a permanent solution.
With the denormalized data, you could do it in a single SQL using REGEXP_SUBSTR:
SELECT COUNT(DISTINCT(regexp_substr(country, '[^ ]+', 1, LEVEL))) as "COUNT"
FROM table_name
CONNECT BY LEVEL <= regexp_count(country, ' ')+1
/
Demo:
SQL> WITH sample_data AS
2 ( SELECT 'japan singapore japan chinese chinese chinese' str FROM dual
3 )
4 -- end of sample_data mocking real table
5 SELECT COUNT(DISTINCT(regexp_substr(str, '[^ ]+', 1, LEVEL))) as "COUNT"
6 FROM sample_data
7 CONNECT BY LEVEL <= regexp_count(str, ' ')+1
8 /
COUNT
----------
3
See Split single comma delimited string into rows in Oracle to understand how the query works.
UPDATE
For multiple delimited string rows you need to take care of the number of rows formed by the CONNECT BY clause.
See Split comma delimited strings in a table in Oracle for more ways of doing the same task.
Setup
Let's say you have a table with 3 rows like this:
SQL> CREATE TABLE t(country VARCHAR2(200));
Table created.
SQL> INSERT INTO t VALUES('japan singapore japan chinese chinese chinese');
1 row created.
SQL> INSERT INTO t VALUES('singapore indian malaysia');
1 row created.
SQL> INSERT INTO t VALUES('french french french');
1 row created.
SQL> COMMIT;
Commit complete.
SQL> SELECT * FROM t;
COUNTRY
---------------------------------------------------------------------------
japan singapore japan chinese chinese chinese
singapore indian malaysia
french french french
- Using REGEXP_SUBSTR and REGEXP_COUNT:
We expect the output as 6
since there are 6 unique strings.
SQL> SELECT COUNT(DISTINCT(regexp_substr(t.country, '[^ ]+', 1, lines.column_value))) count
2 FROM t,
3 TABLE (CAST (MULTISET
4 (SELECT LEVEL FROM dual
5 CONNECT BY LEVEL <= regexp_count(t.country, ' ')+1
6 ) AS sys.odciNumberList ) ) lines
7 ORDER BY lines.column_value
8 /
COUNT
----------
6
There are many other methods to achieve the desired output. Let's see how:
- Using XMLTABLE
SQL> SELECT COUNT(DISTINCT(country)) COUNT
2 FROM
3 (SELECT trim(COLUMN_VALUE) country
4 FROM t,
5 xmltable(('"'
6 || REPLACE(country, ' ', '","')
7 || '"'))
8 )
9 /
COUNT
----------
6
- Using MODEL clause
SQL> WITH
2 model_param AS
3 (
4 SELECT country AS orig_str ,
5 ' '
6 || country
7 || ' ' AS mod_str ,
8 1 AS start_pos ,
9 Length(country) AS end_pos ,
10 (LENGTH(country) -
11 LENGTH(REPLACE(country, ' '))) + 1 AS element_count ,
12 0 AS element_no ,
13 ROWNUM AS rn
14 FROM t )
15 SELECT COUNT(DISTINCT(Substr(mod_str, start_pos, end_pos-start_pos))) count
16 FROM (
17 SELECT *
18 FROM model_param
19 MODEL PARTITION BY (rn, orig_str, mod_str)
20 DIMENSION BY (element_no)
21 MEASURES (start_pos, end_pos, element_count)
22 RULES ITERATE (2000)
23 UNTIL (ITERATION_NUMBER+1 = element_count[0])
24 ( start_pos[ITERATION_NUMBER+1] =
25 instr(cv(mod_str), ' ', 1, cv(element_no)) + 1,
26 end_pos[ITERATION_NUMBER+1] =
27 instr(cv(mod_str), ' ', 1, cv(element_no) + 1) )
28 )
29 WHERE element_no != 0
30 ORDER BY mod_str , element_no
31 /
COUNT
----------
6
PL / SQL word count from string
One another option would be the query below :
select word, count(1) as repeating
from
(
with t(str) as
(
select 'Hello, I like ham pizza more than mozzarella pizza' from dual
)
select regexp_replace(regexp_substr(str, '[^\ ]+', 1, level),'[^a-zA-Z]','')
as word
from t
cross join dual
connect by level <= regexp_count(str, '[^\ ]+')
)
group by word
order by repeating desc, word;
WORD REPEATING
---------- ---------
pizza 2
ham 1
Hello 1
I 1
like 1
more 1
mozzarella 1
than 1
How to count the number of occurrences of a character in an Oracle varchar value?
Here you go:
select length('123-345-566') - length(replace('123-345-566','-',null))
from dual;
Technically, if the string you want to check contains only the character you want to count, the above query will return NULL; the following query will give the correct answer in all cases:
select coalesce(length('123-345-566') - length(replace('123-345-566','-',null)), length('123-345-566'), 0)
from dual;
The final 0 in coalesce
catches the case where you're counting in an empty string (i.e. NULL, because length(NULL) = NULL in ORACLE).
Counting the number of words in a Column in Oracle SQL
The answer to this data set is below:
`select * from(
select x,count(*) as coun from (
select substr(names,
INSTR(names, ' ', -1, 1)+1) as x
from abc
union all
SELECT SUBSTR(names,
INSTR(names, ' ', 1, 1) + 1,
INSTR(names, ' ', 1, 2) - INSTR(names, ' ', 1, 1) - 1) as x
FROM abc
union all
SELECT SUBSTR(names,1,
INSTR(names, ' ',1 , 1)-1) as x
FROM abc
)
where x is not null and x not in ('1','2','3','4','5','6','7')
group by x
order by coun desc)
where rownum < 4800;'
Answer:
How do I Count the words in a string using regex
INSTR is also a viable option. By looking for the second occurrence of a space, that will indicate that the string has at least 3 words.
WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE instr(title, ' ', 1, 2) > 0;
If you do with to stick with regex, the regex expression below can be used to find books that have 3 or more words.
WITH
books
AS
(SELECT 'Tom Sawyer' title FROM DUAL
UNION ALL
SELECT 'A tale of two cities' FROM DUAL
UNION ALL
SELECT 'The Little Prince' FROM DUAL
UNION ALL
SELECT 'Don Quixote' FROM DUAL)
SELECT title
FROM books
WHERE REGEXP_LIKE (title, '(\S+\s){2,}');
(Thanks @Littlefoot for the books!)
Related Topics
Sql: Finding the Closest Lat/Lon Record on Google Bigquery
Delete Records Which Are Considered Duplicates Based on Same Value on a Column and Keep the Newest
How to Import an Excel Spreadsheet into SQL Server 2008R2 Database
SQL - Find Statement That Insert Specific Values
How to Find If a Value Exists Within a Varray
Running Sum in Access Query with Group By
Oracle Rows to Column Transformation
Using Dates from Cell or Named Range in SQL Query
How to Replace Multiple Characters in Access SQL
Insert a Select Group By:More Target Columns Than Expressions Error
Remove Ascii Extended Characters 128 Onwards (Sql)
Want a Stored Procedure for Comma Seperated String Which Is of a Column (Has 20000 Rows ) in a Table