SQL - Unequal Left Join Bigquery

SQL - Unequal left join BigQuery

Below is for BigQuery Standard SQL and mostly reproduce logic in your query with exception of not including days where no activity at all is found


#standardSQL
SELECT
daily_use.user_id
, wd.date AS DATE
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
-- ORDER BY 1,2

if for whatever reason you still need to exactly reproduce your logic - you can embrace above with final left join as below:

#standardSQL
SELECT *
FROM dates AS wd
LEFT JOIN (
SELECT
daily_use.user_id
, wd.date AS date
, MIN(DATE_DIFF(wd.date, daily_use.activity_date, DAY)) AS days_since_last_action
FROM dates AS wd
CROSS JOIN daily_use
WHERE wd.date BETWEEN
daily_use.activity_date AND DATE_ADD(daily_use.activity_date, INTERVAL 30 DAY)
GROUP BY 1,2
) AS daily_use
USING (date)
-- ORDER BY 1,2

Why does BigQuery complain about a left join error when I introduce a subquery?

Besides @rtenha comment. This is a known issue with Bigquery and can be tracked in here

Also, in these situation is suggested to change <= or >= to =.

Additionally, taking this query as example:

SELECT
*,
(
SELECT
MIN(chosen)
FROM
chosen_t
WHERE
chosen >= requested) AS rounded_v FROM
tots)

A workaround suggested by the BigQuery Team, is adding fake equality join predicate as this:

SELECT
*,
(
SELECT
MIN(chosen)
FROM
chosen_t
WHERE
chosen >= requested and 0 * chosen = 0 * requested) AS rounded_v

How to do a left outer join on inequality?

What you need to do is use a cross join and then add your join condition in where clause as shown below:

with weekly_periods as(
select
ticket_id,
start_time_in_minutes_from_week,
raw_delta_in_minutes,
week_number,
greatest(0, start_time_in_minutes_from_week - week_number * (7 * 24 * 60)) as ticket_week_start_time,
least(start_time_in_minutes_from_week + raw_delta_in_minutes - week_number * (7 * 24 * 60),(7 * 24 * 60)) as ticket_week_end_time
from
ticket_solved_time,
unnest(generate_array(0, floor((start_time_in_minutes_from_week + raw_delta_in_minutes) / (7 * 24 * 60)), 1)) as week_number,
intercepted_periods as(
select
ticket_id,
week_number,
ticket_week_start_time,
ticket_week_end_time,
schedule.start_time as schedule_start_time,
schedule.end_time as schedule_end_time,
least(ticket_week_end_time, schedule.end_time) - greatest(ticket_week_start_time, schedule.start_time) as scheduled_minutes
from
weekly_periods
Cross join
schedule
Where ticket_week_start_time <= schedule.end_time
and ticket_week_end_time >= schedule.start_time


Related Topics



Leave a reply



Submit