How to Check If Value Exists in Each Group (After Group By)

How to check if value exists in each group (after group by)

Try this query:

SELECT uid 
FROM subscribes
GROUP BY uid
HAVING COUNT(*) > 2
AND max( CASE "subscription_type" WHEN 'type1' THEN 1 ELSE 0 END ) = 0

Check whether value exists in column for each group

You can filter SQL groups with the HAVING clause. For example, you can group your table by users and their activity, and then filter it to contain only those that have completed the tutorial:

SELECT user FROM tbl
GROUP BY user, activity
HAVING activity = 'completed_tutorial';

EDIT: After OP has edited their question, this is my new answer. Here, I assume that your table has a date field.

SELECT *, COALESCE(date >= (
SELECT date FROM tbl WHERE activity = 'completed_tutorial'
AND user = outertbl.user
), FALSE)
FROM tbl AS outertbl
ORDER BY date

Notice, that such query is essentially N² when unoptimised, so I would recommend instead to just get the data from the database and then process it in your program.

Check if a value exists using multiple conditions within group in pandas

Use groupby on Group column and then use transform and lambda function as:

g = df.groupby('Group')
df['Expected'] = (g['Value1'].transform(lambda x: x.eq(7).any()))&(g['Value2'].transform(lambda x: x.eq(9).any()))

Or using groupby, apply and merge using parameter how='left' as:

df.merge(df.groupby('Group').apply(lambda x: x['Value1'].eq(7).any()&x['Value2'].eq(9).any()).reset_index(),how='left').rename(columns={0:'Expected_Output'})

Or using groupby, apply and map as:

df['Expected_Output'] = df['Group'].map(df.groupby('Group').apply(lambda x: x['Value1'].eq(7).any()&x['Value2'].eq(9).any()))

print(df)
Group Value1 Value2 Expected_Output
0 1 3 9 True
1 1 7 6 True
2 1 9 7 True
3 2 3 8 False
4 2 8 5 False
5 2 7 6 False

SQL check if group containes certain values of given column (ORACLE)

This could be a way:

/* input data */ 
with yourTable(log_id , request_id , status_id) as (
select 1 , 2 , 5 from dual union all
select 2 , 2 , 10 from dual union all
select 3 , 2 , 20 from dual union all
select 4 , 3 , 10 from dual union all
select 5 , 3 , 20 from dual
)
/* query */
select request_id
from yourTable
group by request_id
having count( distinct case when status_id in (5,10) then status_id end) = 2

How it works:

select request_id, 
case when status_id in (5,10) then status_id end as checkColumn
from yourTable

gives

REQUEST_ID CHECKCOLUMN
---------- -----------
2 5
2 10
2
3 10
3

So the condition count (distinct ...) = 2 does the work

Check if values in a groupby exists in a dataframe

Try:

idx = orders.groupby('user_id')['order_timestamp'].idxmin()
orders.loc[idx]

Output:

   id  user_id order_timestamp
2 3 10 2018-02-01
4 5 11 2019-01-01
7 8 12 2018-05-01

And,

non_first_orders = orders[~orders.index.isin(idx)]
non_first_orders

Output:

   id  user_id order_timestamp
0 1 10 2019-01-01
1 2 10 2019-01-10
3 4 11 2019-10-02
5 6 11 2019-03-01
6 7 12 2019-06-09
8 9 12 2019-03-12

Check whether value exists in specific group of rows in a dataframe

A solution using the tidyverse package, or you can just load the dplyr and the tidyr package to achieve this.

The idea is to fill in the NA with the closest non-NA value in the from and to column. After that, use action_result == "CallConnected" to check if there are items matching "CallConnected", group by from and to, and summarize with sum to count the total matching records.

library(tidyverse)

df2 <- df %>%
fill(from) %>%
fill(to) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df2
# # A tibble: 2 x 3
# from to CallConnected
# <chr> <chr> <int>
# 1 (192) 242-2345 (900) 301-3451 0
# 2 (832) 345-3168 (900) 234-1231 0

Update

If duplicates are a concern, we can use rleid from the data.table package to create ID after the fill function. Below is an example.

library(tidyverse)
library(data.table)

# Create an example with duplication
df_dup <- bind_rows(df, df %>% slice(1:5))

df_dup2 <- df_dup %>%
fill(from) %>%
fill(to) %>%
mutate(ID = rleid(from, to)) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(ID, from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df_dup2
# # A tibble: 3 x 4
# ID from to CallConnected
# <int> <chr> <chr> <int>
# 1 1 (192) 242-2345 (900) 301-3451 0
# 2 2 (832) 345-3168 (900) 234-1231 0
# 3 3 (192) 242-2345 (900) 301-3451 0

Pandas check if a value exists using multiple conditions within group and count value if true

Is it:

df['Count Match'] = df['Match'].astype(int).groupby(df['id']).transform('sum')

Check whether a value is found within a group in a PySpark dataframe

Use max window function:

df.selectExpr("*", "max(fruit = 'pear') over (partition by id) as flag").show()

+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot|false|
| c| apple|false|
| b| pear| true|
| a| apple| true|
| a| pear| true|
+---+------+-----+

If you need to check multiple fruits, you can use in operator. For instance to check carrot and apple:

df.selectExpr("*", "max(fruit in ('carrot', 'apple')) over (partition by id) as flag").show()
+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot| true|
| c| apple| true|
| b| pear|false|
| a| apple| true|
| a| pear| true|
+---+------+-----+

If you prefer python syntax:

from pyspark.sql.window import Window
import pyspark.sql.functions as f

df.select("*",
f.max(
f.col('fruit').isin(['carrot', 'apple'])
).over(Window.partitionBy('id')).alias('flag')
).show()
+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot| true|
| c| apple| true|
| b| pear|false|
| a| apple| true|
| a| pear| true|
+---+------+-----+

How to check that a value exists in the group, and the other isn't?

I would put both conditions in the HAVING clause, in combination with a CASE WHEN construct:

select   num 
from table
group by num
having max(case text
when 'A' then 1
when 'B' then 2
end) = 1

Or put in another way:

select   num 
from table
group by num
having max(case when text='A' then 1 else 0 end) = 1
and max(case when text='B' then 1 else 0 end) = 0;

This method could be more efficient than a query where the table is selected twice (subquery or join)



Related Topics



Leave a reply



Submit