How to Get a Distinct List of Words Used in All Field Records Using Ms SQL

How to get a distinct list of words used in all Field Records using MS SQL?

I do not think you can do this with a SELECT. The best chance is to write a user defined function that returns a table with all the words and then do SELECT DISTINCT on it.


Disclaimer: Function dbo.Split is from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

CREATE TABLE test
(
id int identity(1, 1) not null,
description varchar(50) not null
)

INSERT INTO test VALUES('The dog jumped over the fence')
INSERT INTO test VALUES('The giant tripped on the fence')

CREATE FUNCTION dbo.Split
(
@RowData nvarchar(2000),
@SplitOn nvarchar(5)
)
RETURNS @RtnValue table
(
Id int identity(1,1),
Data nvarchar(100)
)
AS
BEGIN
Declare @Cnt int
Set @Cnt = 1

While (Charindex(@SplitOn,@RowData)>0)
Begin
Insert Into @RtnValue (data)
Select
Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
Set @Cnt = @Cnt + 1
End

Insert Into @RtnValue (data)
Select Data = ltrim(rtrim(@RowData))

Return
END

CREATE FUNCTION dbo.SplitAll(@SplitOn nvarchar(5))
RETURNS @RtnValue table
(
Id int identity(1,1),
Data nvarchar(100)
)
AS
BEGIN
DECLARE My_Cursor CURSOR FOR SELECT Description FROM dbo.test
DECLARE @description varchar(50)

OPEN My_Cursor
FETCH NEXT FROM My_Cursor INTO @description
WHILE @@FETCH_STATUS = 0
BEGIN
INSERT INTO @RtnValue
SELECT Data FROM dbo.Split(@description, @SplitOn)
FETCH NEXT FROM My_Cursor INTO @description
END
CLOSE My_Cursor
DEALLOCATE My_Cursor

RETURN

END

SELECT DISTINCT Data FROM dbo.SplitAll(N' ')

How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?

Shell script might be efficient...

  1. SELECT CONCAT_WS(' ', col_a, col_b, col_c) INTO OUTFILE 'x' ... to get the columns into a file
  2. tr ' ' "\n" <x -- split into one word per line
  3. awk 'length($1) >= 5' -- minimum size of 5 characters per word
  4. sort -u -- to dedup

There are no stopwords, but sed or awk could deal with that.

 mysql -e "SELECT ... INTO OUTFILE 'x' ..." ...
tr ' ' "\n" <x | awk 'length($1) >= 5' | sort -u

How to get tally of unique words using only SQL?

I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.

STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')

https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/
This would turn it into a table where every word is a row.

Once you've set up your data table, then it's easy.

Your_table:

[Word]
I'm
representing
for
them
...
world

Then you can just write:

SELECT Word, count(*) 
FROM your_table
GROUP BY Word;

Your output would be:

Word   |    Count
I'm 1
representing 1

SQL to find the number of distinct values in a column

You can use the DISTINCT keyword within the COUNT aggregate function:

SELECT COUNT(DISTINCT column_name) AS some_alias FROM table_name

This will count only the distinct values for that column.

Get unique values using STRING_AGG in SQL Server

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS
NewField
from (
select distinct ProjectId, newId.value
FROM [dbo].[Data] WITH(NOLOCK)
CROSS APPLY STRING_SPLIT([bID],';') AS newID
WHERE newID.value IN ( 'O95833' , 'Q96NY7-2' )
) x
GROUP BY ProjectID
ORDER BY ProjectID

How to Select Every Row Where Column Value is NOT Distinct

This is significantly faster than the EXISTS way:

SELECT [EmailAddress], [CustomerName] FROM [Customers] WHERE [EmailAddress] IN
(SELECT [EmailAddress] FROM [Customers] GROUP BY [EmailAddress] HAVING COUNT(*) > 1)

How to select records without duplicate on just one field in SQL?

Try this:

SELECT MIN(id) AS id, title
FROM tbl_countries
GROUP BY title


Related Topics



Leave a reply



Submit