Use Google Bigquery to Build Histogram Graph

Use google bigquery to build histogram graph

See the 2019 update, with #standardSQL --Fh

The subquery idea works, as does "CASE WHEN" and then doing a group by:

SELECT COUNT(field1), bucket 
FROM (
    SELECT field1, CASE WHEN age >=  0 AND age < 10 THEN 1
                        WHEN age >= 10 AND age < 20 THEN 2
                        WHEN age >= 20 AND age < 30 THEN 3
                        ...
                        ELSE -1 END as bucket
    FROM table1) 
GROUP BY bucket

Alternately, if the buckets are regular -- you could just divide and cast to an integer:

SELECT COUNT(field1), bucket 
FROM (
    SELECT field1, INTEGER(age / 10) as bucket FROM table1)
GROUP BY bucket

Get a histogram of a query Bigquery

I think you want two levels of aggregation:

SELECT total_transaction, COUNT(*)
FROM (SELECT customer_no, COUNT(*) AS total_transaction
      FROM [bi-dwhdev-01:source.daily_order]
      WHERE DATE(order_time) >=  '2018-04-01' AND DATE(order_time) <=  '2018-04-10'
      GROUP BY customer_no
     ) c
GROUP BY total_transaction
ORDER BY total_transaction DESC;

BigQuery: Count elements in APPROX_QUNATILES

The title of the question says "Count elements in APPROX_QUANTILES", and I'm going to answer that. As your ultimate goal is to build a histogram, please see this question.

To count the number of elements in each bucket, we can do something like:

WITH data AS ( 
    SELECT *, ActualElapsedTime datapoint
    FROM `fh-bigquery.flights.ontime_201903`
    WHERE FlightDate_year = "2018-01-01" 
    AND Origin = 'SFO' AND Dest = 'JFK'
  )
  , quantiles AS (
  SELECT *, IFNULL(LEAD(bucket_start) OVER(ORDER BY bucket_i) , 0100000) bucket_end
  FROM UNNEST((
    SELECT APPROX_QUANTILES(datapoint, 10)
    FROM data
  )) bucket_start WITH OFFSET bucket_i
) 

SELECT COUNT(*) count, bucket_i
  , ANY_VALUE(STRUCT(bucket_start, bucket_end)) b, MIN(datapoint) min,  MAX(datapoint) max
FROM data
JOIN quantiles
ON data.datapoint >= bucket_start AND data.datapoint < bucket_end
GROUP BY bucket_i
ORDER BY bucket_i

Sample Image

Visualized, we get something like:

Sample Image

Which tells us:

Don't use APPROX_QUANTILES to build a histogram, because each bucket will end up having about the same amount of elements. That's the goal of a quantile.
APPROX_QUANTILES is very "APPROX". As you can see each quantile didn't end up with the same amount of elements.
It takes between ~305 and ~357 minutes to fly from SFO to JFK.

Calculating and displaying customer lifetime value histogram with BigQuery and Data Studio

I was able to do a similar reproduction to what you describe but it's not straightforward so I'll try to detail everything. The main idea is to have two data sources from the same table: one contains customer_id and product_id so that we can filter it while the other one contains customer_id and the already calculated amount_bucket field. This way we can join it (blend data) on customer_id and filter according to product_id which won't change the amount_bucket calculations.

I used the following script to create some data in BigQuery:

CREATE OR REPLACE TABLE data_studio.histogram
(
  customer_id STRING,
  product_id STRING,
  amount INT64
);

INSERT INTO data_studio.histogram (customer_id, product_id, amount)
VALUES ('John', 'Game', 60),
       ('John', 'TV', 800),
       ('John', 'Console', 300),
       ('Paul', 'Sofa', 1200),
       ('George', 'TV', 750),
       ('Ringo', 'Movie', 20),
       ('Ringo', 'Console', 250)
;

Then I connect directly to the BigQuery table and get the following fields. Data source is called histogram:

Sample Image

We add our second data source (BigQuery) using a custom query:

SELECT
  customer_id,
  CASE
    WHEN SUM(amount) < 500 THEN '0-500'
    WHEN SUM(amount) < 1000 THEN '500-1000'
    WHEN SUM(amount) < 1500 THEN '1000-1500'
    ELSE '1500+'
END
  AS amount_bucket
FROM
  data_studio.histogram
GROUP BY
  customer_id

With only the latter we could already do a basic histogram with the following configuration:

Sample Image

Dimension is amount_bucket, metric is Record count. I made a bucket_order custom field to sort it as lexicographically '1000-1500' comes before '500-1000':

CASE 
  WHEN amount_bucket = '0-500' THEN 0
  WHEN amount_bucket = '500-1000' THEN 1
  WHEN amount_bucket = '1000-1500' THEN 2
  ELSE 3
END

Now we add the product_id filter on top and a new chart with the following configuration:

Sample Image

Note that metric is CTD (Count Distinct) of customer_id and the Blended data data source is implemented as:

Sample Image

An example where I filter by TV so only George and John appear but the other products are still counted for the total amount calculation:

Sample Image

I hope it works for you.

Create ranges based on the data

Below is for BigQuery Standard SQL

#standardSQL
WITH price_ranges AS (
  SELECT '0-10' price_range UNION ALL 
  SELECT '11-20' UNION ALL 
  SELECT '21-30' UNION ALL 
  SELECT '30-40' UNION ALL 
  SELECT '40-50' 
)
SELECT price_range, COUNT(1) number_sold
FROM `project.dataset.table`
JOIN price_ranges 
ON CAST(price_sold AS INT64)
  BETWEEN CAST(SPLIT(price_range, '-')[OFFSET(0)] AS INT64) 
  AND CAST(SPLIT(price_range, '-')[OFFSET(1)] AS INT64)
GROUP BY price_range
-- ORDER BY price_range

If to apply to sample data from your question - result is

Row price_range number_sold  
1   0-10        1    
2   11-20       2    
3   30-40       1    
4   40-50       2

Use Google Bigquery to Build Histogram Graph