Using SQL to Determine Word Count Stats of a Text Field

Using SQL to determine word count stats of a text field

The text handling capabilities of MySQL aren't good enough for what you want. A stored function is an option, but will probably be slow. Your best bet to process the data within MySQL is to add a user defined function. If you're going to build a newer version of MySQL anyway, you could also add a native function.

The "correct" way is to process the data outside the DB since DBs are for storage, not processing, and any heavy processing might put too much of a load on the DBMS. Additionally, calculating the word count outside of MySQL makes it easier to change the definition of what counts as a word. How about storing the word count in the DB and updating it when a document is changed?

Example stored function:

DELIMITER $$
CREATE FUNCTION wordcount(str LONGTEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=char_length(str);
SET idx = 1;
WHILE idx <= maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]';
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$
DELIMITER ;

MySQL - word frequency count on long textual field

My logic for this question is: extract all words and count them!

So, create a table like your stored data:

CREATE TABLE `tbltest` (
`Rev_id` int(11) NOT NULL AUTO_INCREMENT,
`place_id` int(11) DEFAULT NULL,
`Stars` int(11) DEFAULT NULL,
`Category` varchar(45) DEFAULT NULL,
`Text` varchar(255) DEFAULT NULL,
PRIMARY KEY (`Rev_id`),
UNIQUE KEY `id_UNIQUE` (`Rev_id`)
) ENGINE=InnoDB AUTO_INCREMENT=4 DEFAULT CHARSET=utf8;

And creating a table for words:

CREATE TABLE `counting` (
`word` varchar(45) NOT NULL,
`counts` int(11) DEFAULT NULL,
PRIMARY KEY (`word`),
UNIQUE KEY `word_UNIQUE` (`word`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8;

Now, create the MySQL Stored Procedure for splitting sentences and counting words:

drop procedure if exists sentence_words;
delimiter #
create procedure sentence_words(IN Cat VARCHAR(45))

begin

declare w_max int unsigned default 1;
declare w_counter int unsigned default 0;
declare done int unsigned default 0;

declare sentence varchar(255) default null;
declare cur cursor for select `text` from `tbltest` where `Category` = Cat;
declare continue handler for not found set done=1;
set done=0;
open cur;
myloop: loop
fetch cur into sentence;
if done = 1 then leave myloop; end if;
-- refine sentence!
set sentence = replace(replace(replace(replace(
sentence
,'.',' '),'!',' '),',',' '),';',' ');
set sentence = replace(trim(sentence),' ',' ');
set w_max = length(sentence)-length(replace(sentence,' ',''))+1;
start transaction;
while w_counter < w_max do
insert into `counting`(counts,word) values
(1, substring_index( substring_index(
sentence,' ',w_counter+1) ,' ',-1)
)
ON DUPLICATE KEY UPDATE counts=counts+1;
set w_counter=w_counter+1;
end while;
commit;
end loop;
close cur;
end #
delimiter ;

Finally, you can call the procedure and find words and counts in counting table. If you need each category word counts separated, remember to truncate or backup counting table before calling procedure for each Category.

truncate `counting`;
call sentence_words('Bar');
select * from `counting` order by counts desc; -- ? where length(word)>2
-- words | counts --
'audience', '1'
'bad', '1'
'place', '1'
'Poor', '1'

Count words in column with SQL

SELECT LENGTH(words) - LENGTH(REPLACE(words, ' ', '')) + 1 AS words_count
FROM table_name

SQL Query Counting Instances of a Word in a Record

There is no internal mysql function counting occurences of a substring in a string, but you can compare length of a string to a string with your word replaced by empty strings, as REPLACE() works for all occurences.

SELECT
(CHAR_LENGTH(sentence)-CHAR_LENGTH(REPLACE(LOWER(sentence),'the','')))/CHAR_LENGTH('the')
AS occurences
FROM yourtable;

mysql count word in sql syntax

Use the excellent function from this question by @otis in your query:

mysql> select * from test;
+----+------------------------------+
| id | sentence |
+----+------------------------------+
| 0 | Hello World |
| 1 | Hello World |
| 2 | Mary had a little lamb |
| 3 | Her fleece was white as snow |
| 4 | Everywhere that mary went |
| 5 | Umm, sheep followed her |
+----+------------------------------+
6 rows in set (0.00 sec)

mysql> SELECT sentence, wordcount(sentence) as "Words" from test;
+------------------------------+-------+
| sentence | Words |
+------------------------------+-------+
| Hello World | 2 |
| Hello World | 2 |
| Mary had a little lamb | 5 |
| Her fleece was white as snow | 6 |
| Everywhere that mary went | 4 |
| Umm, sheep followed her | 4 |
+------------------------------+-------+
6 rows in set (0.02 sec)

To make the function work, you need to execute the declaration of the function in MySQL. It is just like executing any other query:

mysql> DELIMITER $$
mysql> CREATE FUNCTION wordcount(str TEXT)
RETURNS INT
DETERMINISTIC
SQL SECURITY INVOKER
NO SQL
BEGIN
DECLARE wordCnt, idx, maxIdx INT DEFAULT 0;
DECLARE currChar, prevChar BOOL DEFAULT 0;
SET maxIdx=char_length(str);
WHILE idx < maxIdx DO
SET currChar=SUBSTRING(str, idx, 1) RLIKE '[[:alnum:]]';
IF NOT prevChar AND currChar THEN
SET wordCnt=wordCnt+1;
END IF;
SET prevChar=currChar;
SET idx=idx+1;
END WHILE;
RETURN wordCnt;
END
$$
Query OK, 0 rows affected (0.10 sec)

mysql> DELIMITER ;

MySQL: Selecting rows ordered by word count

Well, this will not perform very well since string calculations need to be performed for all rows:

You can count number of words in a MySQL column like so: SELECT SUM( LENGTH(name) - LENGTH(REPLACE(name, ' ', ''))+1) FROM table (provided that words are defined as "whatever-delimited-by-a-whitespace")

Now, add this to your query:

SELECT
<fields>
FROM
<table>
WHERE
<condition>
ORDER BY SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1)

Or, add it to the condition:

SELECT
<fields>
FROM
<table>
WHERE
SUM(LENGTH(<fieldWithWords>) - LENGTH(REPLACE(<fieldWithWords>, ' ', '')) + 1) BETWEEN 10 AND 20
ORDER BY <something>

search word count query in mysql

After few hours of googlling and debugging finally i have got it solved.

I have used combination of char_length and replace to achieve this task.

What i end up with is as below.

select  *,(
(char_length(name) - char_length(replace(name,'sam',''))) +
(char_length(description) - char_length(replace(description,'sam','')))
) / char_length('sam') as SearchCount
from
product
order by
SearchCount desc

above query is CASE SENSITIVE but do not worry i have also solved it with CASE-INSESITIVE see below query.

select  *,
(
(char_length(name) - char_length(replace(LOWER(name),LOWER('Sam'),''))) +
(char_length(description) -
char_length(replace(LOWER(description),LOWER('Sam'),'')))
) / char_length('sam') as SearchCount
from
product
order by
SearchCount desc

after having this query all we need to do is add WHERE clause to make it work.

Hope this will help other People too.

Thanks for help (All the people who answered and deleted and comment.)



Related Topics



Leave a reply



Submit