How to Do a Case Sensitive Group By

How to do a case sensitive GROUP BY?

You can use an case sensitive collation:

with temp as
(
select 'Test' COLLATE Latin1_General_CS_AS as name
UNION ALL
select 'TEST'
UNION ALL
select 'test'
UNION ALL
select 'tester'
UNION ALL
select 'tester'
)
SELECT name, COUNT(name)
FROM temp
group by name

SQL Case Sensitive Group By

Assign the COLLATE also on the GROUP BY clause

select accountCode COLLATE Latin1_General_CS_AS, count(OrderId)  
from <<TableName>>
group by accountCode COLLATE Latin1_General_CS_AS

Make GroupBy query case-sensitive

You need to change the collation of the column to one that's case sensitive. For example:

SELECT [name] COLLATE Latin1_General_100_CS_AS_SC;
FROM [Example].[dbo].[User]
GROUP BY [Example].[dbo].[User].[name] COLLATE Latin1_General_100_CS_AS_SC;

Case sensitive collations usually have a CS in their name.

To choose which collation to use, you can find the collation of the column with:

SELECT c.collation_name
FROM sys.columns c
INNER JOIN sys.objects o
ON o.object_id = c.object_id
AND o.type = 'U'
INNER JOIN sys.schemas s
ON s.schema_id = o.schema_id
WHERE lower(s.name) = 'dbo'
AND lower(o.name) = 'user'
AND lower(c.name) = 'name';

A good guess is to take that collation and replace the CI (for case insensitive) for a CS and use it as demonstrated above.

grouping the same value in upper case and lower case with select query

Use lower() or upper():

SELECT MIN(InputParameterName) as InputParameterName, Period,
SUM(CAST(PeriodInput AS FLOAT)) PeriodInput
FROM Table1
GROUP BY LOWER(InputParameterName), Period;

The use LOWER() in the GROUP BY defines the columns. It is not used in the SELECT so you always get a value in the data.

If you don't care about the case, then you should look into setting the collation of the column, table, or database so case is ignored in expressions involving strings.

EDIT:

If the above result returns two rows, then case is not the problem. You probably have hidden characters. the most likely are spaces, so you can try:

GROUP BY LOWER(REPLACE(InputParameterName, ' ', '')), Period;

Hopefully, the hidden spaces are no more complicated than spaces.

EDIT II:

I get it now. You want the two rows, you just want the values to be the same. Then window functions can do what you want:

SELECT MIN(InputParameterName) OVER (PARTITION BY LOWER(InputParameterName)),
Period, SUM(CAST(PeriodInput AS FLOAT)) as PeriodInput
FROM Table1
GROUP BY InputParameterName, Period;

Case-sensitive GROUP BY in an Access query

As explained in the Microsoft support article here:

The Microsoft Jet database engine is inherently case-insensitive. When joining tables, it matches lowercase "abc" to uppercase "ABC," which in most cases is desirable.

(The same is true for the newer "Access Database Engine", a.k.a. "ACE", which is used by Access 2010.)

The article goes on to describe several strategies for performing case-sensitive JOINs. Of those, the one that is most likely to work for you (where you need both your JOINs and your GROUP BY clauses to be case-sensitive) is the one they call "Hexadecimal Expansion" where you create another column in each table that will store a hexadecimal representation of the ID value. The function they suggest is...

Public Function StrToHex(X As Variant) As Variant
Dim I As Long, Temp As String
If IsNull(X) Then Exit Function
Temp = Space$(Len(X) * 2)
For I = 1 To Len(X)
Mid$(Temp, I * 2 - 1, 2) = Right$("0" & Hex$(Asc(Mid$(X, I, 1))), 2)
Next I
StrToHex = Temp
End Function

...which will work provided that your ID strings do not contain Unicode characters. That function will convert a string into the hex representation of each character, e.g.,

StrToHex("GORD") --> "474F5244"
StrToHex("gord") --> "676F7264"

Once you've populated your new "surrogate ID" fields with the corresponding hex values you can use those fields in your JOINs and GROUP BY clauses.

For more information see the above-mentioned support article:

How To Perform a Case-Sensitive JOIN Through Microsoft Jet

Group by is case sensitive in T-SQL even though db and server collations are CI

This is more of extended comment that real answer.

I believe this issue is coming from how SQL Server is attempting to evaluate case statement expression.

To prove that server is case insensetive you can run the following two statements

SELECT CASE WHEN 'Bertrand' = 'bertrand' THEN 'true' ELSE 'false' end

-

DECLARE @base TABLE
(NAME VARCHAR(1)
,value INT
)
INSERT INTO @base Values('a',0),('b',0),('B',0)

SELECT * FROM @base

SELECT name, COUNT(value) AS Cnt
FROM @base
GROUP BY NAME

results:

Sample Image

as you can see here even though letter in second row is lower case and in third row is upper case, group by clause ignores the case. Looking at execution plan there are two expression for

Expr 1007 COUNT([value])    
Expr 1004 CONVERT_IMPLICIT(int,[Expr1007],0)

now when we change it to case

SELECT CASE WHEN name = 'a' THEN 'adam' ELSE 'bertrand' END AS name, COUNT(value) AS Cnt
FROM @base
GROUP BY CASE WHEN name = 'a' THEN 'adam' ELSE 'bertrand' END

execution plan shows 3 expressions. 2 from above and new one

Expr 1004 CASE WHEN [NAME]='a' THEN 'adam' ELSE 'bertrand' END  

so at this point aggregate function is no longer evaluating value of the column name but now it evaluating value of the expression.

What i think is happening is, could be incorrect. When SQL server converts both CASE statement in SELECT and GROUP BY clause to a expression it comes up with different expression value. In this case you might as well do 'bertrand' in select and 'charlie'
in group by clause because if CASE expression is not 100% match between select and group by clause SQL Server will consider them as different Expr aka (columns) that no longer match.


Update:

To take this one step further, the following statement will also fail.

SELECT CASE WHEN name = 'a' THEN 'adam' ELSE UPPER('bertrand') END AS name
,COUNT(value) AS Cnt
FROM @base
GROUP BY CASE WHEN name = 'a' THEN 'adam' ELSE UPPER('Bertrand') END

Even wrapping the different case strings in UPPER() function, SQL Server is still unable to process it.

SQLite using GROUP BY without being case sensitive

You could try the LOWER() function to make them all lower case.

You might also want to use TRIM() to address whitespace at either end.

SELECT count(Email) as Number, Email
FROM tbl_SAP_Users
GROUP BY LOWER(TRIM(Email))
HAVING Number > 1
ORDER BY Number DESC;

pandas groupby - case sensitive issues in groups

Just as a followup to this you don't actually need to overwrite the keyword when you do the grouping. You can instead do the whole transformation in the call to groupby

grouped = df.groupby(df['Keyword'].str.lower())

So as an example you could then have:

df = pandas.DataFrame({'Keyword': ['Attorney', 'ATTORNEY', 'foo'], 'x' : [1, 2, 42]})

df.groupby(df['Keyword'].str.lower()).sum()

Which outputs:

           x
Keyword
attorney 3
foo 42

as you would expect



Related Topics



Leave a reply



Submit