Grouping But With Keeping All Non-Null Values

Grouping but with keeping all non-NULL values

When using a GROUP BY then the aggregate functions can be used for columns that aren't in the GROUP BY.

In this case I assume you want to use MAX, to get only a 1 or a NULL.

SUM or COUNT can also be used to surround a CASE WHEN.

But then those would return a total.

SELECT 
Name,
MAX(CASE WHEN Intolerance = 'Lactose' THEN 1 END) AS Lactose,
MAX(CASE WHEN Intolerance = 'Gluten' THEN 1 END) AS Gluten
FROM Table
GROUP BY Name
ORDER BY Name

Or if you don't want to see NULL's?

Then let the CASE return a varchar instead of a number.

SELECT 
Name,
MAX(CASE WHEN Intolerance = 'Lactose' THEN '1' ELSE '' END) AS Lactose,
MAX(CASE WHEN Intolerance = 'Gluten' THEN '1' ELSE '' END) AS Gluten
FROM Table
GROUP BY Name
ORDER BY Name

How do I group by all non-null values and do not group null values

Try this:

SELECT itemname, SUM(whatever)
FROM tab
WHERE itemname IS NOT NULL
GROUP BY itemname

UNION ALL

SELECT itemname, whatever
FROM tab
WHERE itemname IS NULL

group by and select non null value if present

From your sample data I think that you don't need d in the group by clause.

So get its max:

select 
a, b, c,
max(d) d,
count(distinct e) as something
from tableX
where f between '2019-07-01 00:00:00' and '2019-07-01 23:59:59.999'
group by a, b, c

Group by Only when no NULL Values are present on another Column

Put the condition in the HAVING clause:

select v.id, v.title, v.description, v.UserId, v.createdAt, v.updatedAt, min(usercerid) usercerid
from ViewName v
group by v.id, v.title, v.description, v.UserId, v.createdAt, v.updatedAt
having sum(v.usercerid is null) = 0

You must group by all the columns that you select.

I used min(usercerid) as the output column although it's not obvious that you want it even in the results. If you don't need it remove it.

GROUP BY not NULL values

Turns out, I can just put the NULL check in the GROUP BY clause:

SELECT 
any(Y) AS Y,
any(X) AS X
FROM my_table
GROUP BY COALESCE(Y, CAST(reflect("java.util.UUID", "randomUUID") AS STRING));

My version of Hive doesn't support IFNULL() so COALESCE() is a good alternative. My version Hive also doesn't support UUID() so I called reflect() to get unique id.

group by not-null values

I think the following does what you want:

SELECT *, (To_days(date_expires)-TO_DAYS(NOW())) as dayDiff, COUNT(id) AS change_count
FROM mytable
GROUP BY (case when source_id is null then id else source_id end)
HAVING dayDiff < 4
ORDER BY (case when source_id is null then 1 else 0 end), date_created DESC

It does a conditional group by so the NULL sourceids will not be grouped. It then puts them last using logic in order by.

I didn't understand what you meant by last occurrence. Now I think I do:

SELECT coalesce(s.id, mytable.id) as id,
max(case when s.maxid is not null and s.maxid = myable.id then mytable.name
when s.maxid is null then NULL
else mytable.name
end) as name,
(To_days(date_expires)-TO_DAYS(NOW())) as dayDiff, COUNT(id) AS change_count
FROM mytable left outer join
(select source_id, MAX(id) as maxid
from mytable
where source_id is not null
group by source_id
) s
on mytable.id = s.maxid
GROUP BY (case when source_id is null then id else source_id end)
HAVING dayDiff < 4
ORDER BY (case when source_id is null then 1 else 0 end), date_created DESC

This joins in the information from the latest record (based on highest id).

GROUP BY - do not group NULL

Perhaps you should add something to the null columns to make them unique and group on that? I was looking for some sort of sequence to use instead of UUID() but this might work just as well.

SELECT `table1`.*, 
IFNULL(ancestor,UUID()) as unq_ancestor
GROUP_CONCAT(id SEPARATOR ',') AS `children_ids`
FROM `table1`
WHERE (enabled = 1)
GROUP BY unq_ancestor

MySQL get first non null value after group by

Try using MAX, like this:

SELECT
email,
MAX(`name`)
FROM
(
SELECT
email,
`name`
FROM
multiple_tables_and_unions
) AS emails

GROUP BY email

Pandas Grouping by Id and getting non-NaN values

This should do what you what:

df.groupby('salesforce_id').first().reset_index(drop=True)

That will merge all the columns into one, keeping only the non-NaN value for each run (unless there are no non-NaN values in all the columns for that row; then the value in the final merged column will be NaN).



Related Topics



Leave a reply



Submit