How to Get a Distinct List of Words Used in All Field Records Using Ms SQL

How to get a distinct list of words used in all Field Records using MS SQL?

I do not think you can do this with a SELECT. The best chance is to write a user defined function that returns a table with all the words and then do SELECT DISTINCT on it.

Disclaimer: Function dbo.Split is from http://www.sqlteam.com/forums/topic.asp?TOPIC_ID=50648

CREATE TABLE test
(
    id int identity(1, 1) not null,
    description varchar(50) not null
)

INSERT INTO test VALUES('The dog jumped over the fence')
INSERT INTO test VALUES('The giant tripped on the fence')

CREATE FUNCTION dbo.Split
(
    @RowData nvarchar(2000),
    @SplitOn nvarchar(5)
)  
RETURNS @RtnValue table 
(
    Id int identity(1,1),
    Data nvarchar(100)
) 
AS  
BEGIN 
    Declare @Cnt int
    Set @Cnt = 1

    While (Charindex(@SplitOn,@RowData)>0)
    Begin
        Insert Into @RtnValue (data)
        Select 
            Data = ltrim(rtrim(Substring(@RowData,1,Charindex(@SplitOn,@RowData)-1)))

        Set @RowData = Substring(@RowData,Charindex(@SplitOn,@RowData)+1,len(@RowData))
        Set @Cnt = @Cnt + 1
    End

    Insert Into @RtnValue (data)
    Select Data = ltrim(rtrim(@RowData))

    Return
END

CREATE FUNCTION dbo.SplitAll(@SplitOn nvarchar(5))
RETURNS @RtnValue table
(
    Id int identity(1,1),
    Data nvarchar(100)
)
AS
BEGIN
DECLARE My_Cursor CURSOR FOR SELECT Description FROM dbo.test
DECLARE @description varchar(50)

OPEN My_Cursor
FETCH NEXT FROM My_Cursor INTO @description
WHILE @@FETCH_STATUS = 0
BEGIN
    INSERT INTO @RtnValue
    SELECT Data FROM dbo.Split(@description, @SplitOn)
   FETCH NEXT FROM My_Cursor INTO @description
END
CLOSE My_Cursor
DEALLOCATE My_Cursor

RETURN

END

SELECT DISTINCT Data FROM dbo.SplitAll(N' ')

How to get all distinct words of a specified minimum length from multiple columns in a MySQL table?

Shell script might be efficient...

SELECT CONCAT_WS(' ', col_a, col_b, col_c) INTO OUTFILE 'x' ... to get the columns into a file
tr ' ' "\n" <x -- split into one word per line
awk 'length($1) >= 5' -- minimum size of 5 characters per word
sort -u -- to dedup

There are no stopwords, but sed or awk could deal with that.

 mysql -e "SELECT ... INTO OUTFILE 'x' ..." ...
 tr ' ' "\n" <x  |  awk 'length($1) >= 5'  |  sort -u

How to get tally of unique words using only SQL?

I guess it would depend on how the SQL database would look like. You would have to first turn your 4 row "database" into a data of single column, each row representing one word. To do that you could use something like String_split, where every space would be a delimiter.

STRING_SPLIT('I'm representing for them gangstas all across the world', ' ')

https://www.sqlservertutorial.net/sql-server-string-functions/sql-server-string_split-function/
This would turn it into a table where every word is a row.

Once you've set up your data table, then it's easy.

Your_table:

[Word]
I'm
representing
for 
them
...
world

Then you can just write:

SELECT Word, count(*) 
FROM your_table
GROUP BY Word;

Your output would be:

Word   |    Count
I'm          1
representing 1

SQL to find the number of distinct values in a column

You can use the DISTINCT keyword within the COUNT aggregate function:

SELECT COUNT(DISTINCT column_name) AS some_alias FROM table_name

This will count only the distinct values for that column.

Get unique values using STRING_AGG in SQL Server

Use the DISTINCT keyword in a subquery to remove duplicates before combining the results: SQL Fiddle

SELECT 
ProjectID
,STRING_AGG(value, ',') WITHIN GROUP (ORDER BY value) AS 
NewField
from (
    select distinct ProjectId, newId.value 
    FROM [dbo].[Data] WITH(NOLOCK)  
    CROSS APPLY STRING_SPLIT([bID],';') AS newID  
    WHERE newID.value IN (   'O95833' , 'Q96NY7-2'  )  
) x
GROUP BY ProjectID
ORDER BY ProjectID

How to Select Every Row Where Column Value is NOT Distinct

This is significantly faster than the EXISTS way:

SELECT [EmailAddress], [CustomerName] FROM [Customers] WHERE [EmailAddress] IN
  (SELECT [EmailAddress] FROM [Customers] GROUP BY [EmailAddress] HAVING COUNT(*) > 1)

How to select records without duplicate on just one field in SQL?

Try this:

SELECT MIN(id) AS id, title
FROM tbl_countries
GROUP BY title

How to Get a Distinct List of Words Used in All Field Records Using Ms SQL