How to check if value exists in each group (after group by)
Try this query:
SELECT uid
FROM subscribes
GROUP BY uid
HAVING COUNT(*) > 2
AND max( CASE "subscription_type" WHEN 'type1' THEN 1 ELSE 0 END ) = 0
Check whether value exists in column for each group
You can filter SQL groups with the HAVING
clause. For example, you can group your table by users and their activity, and then filter it to contain only those that have completed the tutorial:
SELECT user FROM tbl
GROUP BY user, activity
HAVING activity = 'completed_tutorial';
EDIT: After OP has edited their question, this is my new answer. Here, I assume that your table has a date field.
SELECT *, COALESCE(date >= (
SELECT date FROM tbl WHERE activity = 'completed_tutorial'
AND user = outertbl.user
), FALSE)
FROM tbl AS outertbl
ORDER BY date
Notice, that such query is essentially N² when unoptimised, so I would recommend instead to just get the data from the database and then process it in your program.
Check if a value exists using multiple conditions within group in pandas
Use groupby
on Group
column and then use transform
and lambda function
as:
g = df.groupby('Group')
df['Expected'] = (g['Value1'].transform(lambda x: x.eq(7).any()))&(g['Value2'].transform(lambda x: x.eq(9).any()))
Or using groupby
, apply
and merge
using parameter how='left'
as:
df.merge(df.groupby('Group').apply(lambda x: x['Value1'].eq(7).any()&x['Value2'].eq(9).any()).reset_index(),how='left').rename(columns={0:'Expected_Output'})
Or using groupby
, apply
and map
as:
df['Expected_Output'] = df['Group'].map(df.groupby('Group').apply(lambda x: x['Value1'].eq(7).any()&x['Value2'].eq(9).any()))
print(df)
Group Value1 Value2 Expected_Output
0 1 3 9 True
1 1 7 6 True
2 1 9 7 True
3 2 3 8 False
4 2 8 5 False
5 2 7 6 False
SQL check if group containes certain values of given column (ORACLE)
This could be a way:
/* input data */
with yourTable(log_id , request_id , status_id) as (
select 1 , 2 , 5 from dual union all
select 2 , 2 , 10 from dual union all
select 3 , 2 , 20 from dual union all
select 4 , 3 , 10 from dual union all
select 5 , 3 , 20 from dual
)
/* query */
select request_id
from yourTable
group by request_id
having count( distinct case when status_id in (5,10) then status_id end) = 2
How it works:
select request_id,
case when status_id in (5,10) then status_id end as checkColumn
from yourTable
gives
REQUEST_ID CHECKCOLUMN
---------- -----------
2 5
2 10
2
3 10
3
So the condition count (distinct ...) = 2
does the work
Check if values in a groupby exists in a dataframe
Try:
idx = orders.groupby('user_id')['order_timestamp'].idxmin()
orders.loc[idx]
Output:
id user_id order_timestamp
2 3 10 2018-02-01
4 5 11 2019-01-01
7 8 12 2018-05-01
And,
non_first_orders = orders[~orders.index.isin(idx)]
non_first_orders
Output:
id user_id order_timestamp
0 1 10 2019-01-01
1 2 10 2019-01-10
3 4 11 2019-10-02
5 6 11 2019-03-01
6 7 12 2019-06-09
8 9 12 2019-03-12
Check whether value exists in specific group of rows in a dataframe
A solution using the tidyverse
package, or you can just load the dplyr
and the tidyr
package to achieve this.
The idea is to fill in the NA
with the closest non-NA value in the from
and to
column. After that, use action_result == "CallConnected"
to check if there are items matching "CallConnected"
, group by from
and to
, and summarize
with sum
to count the total matching records.
library(tidyverse)
df2 <- df %>%
fill(from) %>%
fill(to) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df2
# # A tibble: 2 x 3
# from to CallConnected
# <chr> <chr> <int>
# 1 (192) 242-2345 (900) 301-3451 0
# 2 (832) 345-3168 (900) 234-1231 0
Update
If duplicates are a concern, we can use rleid
from the data.table
package to create ID after the fill
function. Below is an example.
library(tidyverse)
library(data.table)
# Create an example with duplication
df_dup <- bind_rows(df, df %>% slice(1:5))
df_dup2 <- df_dup %>%
fill(from) %>%
fill(to) %>%
mutate(ID = rleid(from, to)) %>%
mutate(CallConnected = action_result == "CallConnected") %>%
group_by(ID, from, to) %>%
summarize(CallConnected = sum(CallConnected)) %>%
ungroup()
df_dup2
# # A tibble: 3 x 4
# ID from to CallConnected
# <int> <chr> <chr> <int>
# 1 1 (192) 242-2345 (900) 301-3451 0
# 2 2 (832) 345-3168 (900) 234-1231 0
# 3 3 (192) 242-2345 (900) 301-3451 0
Pandas check if a value exists using multiple conditions within group and count value if true
Is it:
df['Count Match'] = df['Match'].astype(int).groupby(df['id']).transform('sum')
Check whether a value is found within a group in a PySpark dataframe
Use max
window function:
df.selectExpr("*", "max(fruit = 'pear') over (partition by id) as flag").show()
+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot|false|
| c| apple|false|
| b| pear| true|
| a| apple| true|
| a| pear| true|
+---+------+-----+
If you need to check multiple fruits, you can use in
operator. For instance to check carrot
and apple
:
df.selectExpr("*", "max(fruit in ('carrot', 'apple')) over (partition by id) as flag").show()
+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot| true|
| c| apple| true|
| b| pear|false|
| a| apple| true|
| a| pear| true|
+---+------+-----+
If you prefer python syntax:
from pyspark.sql.window import Window
import pyspark.sql.functions as f
df.select("*",
f.max(
f.col('fruit').isin(['carrot', 'apple'])
).over(Window.partitionBy('id')).alias('flag')
).show()
+---+------+-----+
| id| fruit| flag|
+---+------+-----+
| c|carrot| true|
| c| apple| true|
| b| pear|false|
| a| apple| true|
| a| pear| true|
+---+------+-----+
How to check that a value exists in the group, and the other isn't?
I would put both conditions in the HAVING
clause, in combination with a CASE WHEN
construct:
select num
from table
group by num
having max(case text
when 'A' then 1
when 'B' then 2
end) = 1
Or put in another way:
select num
from table
group by num
having max(case when text='A' then 1 else 0 end) = 1
and max(case when text='B' then 1 else 0 end) = 0;
This method could be more efficient than a query where the table is selected twice (subquery or join)
Related Topics
Convert SQL Server Result Set into String
Postgresql Insert from Select Returning Id
How to Set a Column Value to Null in SQL Server Management Studio
Connect to SQL via Windows Authentication Over Vpn
MySQL "Create Table If Not Exists" -> Error 1050
Strange Postgresql "Value Too Long for Type Character Varying(500)"
How to Find the Size of an Array in Postgresql
How to Copy Data from One Table to Another (Where Both Have Other Fields Too That Are Not in Common)
How to Determine Which Columns Are Shared Between Two Tables
Select All Threads and Order by the Latest One
Postgres: How to Do Composite Keys
How to Import a SQL File into a Rails Database
SQL Query in Spark/Scala Size Exceeds Integer.Max_Value
How to Use a Case Statement in a SQL from Clause
What Is the Easiest Way to Update an Image Field with the Content of a File