Fewest Number of Buckets to Bag Elements in Bigquery

Fewest number of buckets to bag elements in bigquery

Consider below solution - obviously you need to make sure you provide accurate data in matrix CTE and also you need respectively adjust buckets_elements CTE to reflect all buckets in matrix. The rest of CTE's and final query will make a work for you!

with matrix as (
select "element-x" as element, 1 as bucketa, 1 as bucketb, 1 as bucketc, 0 as bucketd, 0 as buckete union all
select "element-y", 0, 0, 1, 0, 0 union all
select "element-z", 1, 0, 1, 0, 0 union all
select "element-p", 0, 0, 1, 0, 0 union all
select "element-q", 1, 0, 0, 1, 0 union all
select "element-r", 0, 1, 0, 1, 1
), buckets_elements as (
select array[struct(a), struct(b), struct(c), struct(d), struct(e)] buckets
from (
select
array_agg(if(bucketa = 1, element, null) ignore nulls) a,
array_agg(if(bucketb = 1, element, null) ignore nulls) b,
array_agg(if(bucketc = 1, element, null) ignore nulls) c,
array_agg(if(bucketd = 1, element, null) ignore nulls) d,
array_agg(if(buckete = 1, element, null) ignore nulls) e
from matrix
)
), columns_names as (
select
regexp_extract_all(to_json_string((select as struct * except(element) from unnest([t]))), r'"([^"]+)"') cols
from matrix t limit 1
), columns_index as (
select generate_array(0, array_length(cols) - 1) as arr
from columns_names
), buckets_combinations as (
select
(select array_agg(
case when n & (1<<pos) <> 0 then arr[offset(pos)] end
ignore nulls)
from unnest(generate_array(0, array_length(arr) - 1)) pos
) as combo
from columns_index cross join
unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
)
select
array(select cols[offset(i)] from columns_names, unnest(combo) i) winners
from (
select combo,
rank() over(order by (select count(distinct el) from unnest(val) v, unnest(v.a) el) desc, array_length(combo)) as rnk
from (
select any_value(c).combo, array_agg(buckets[offset(i)]) val
from buckets_combinations c, unnest(combo) i, buckets_elements b
group by format('%t', c)
)
)
where rnk = 1

with output

Sample Image

Fewest buckets to fit in the elements

Below should work

with buckets_elements as ( 
select array[struct(a), struct(b), struct(c), struct(d), struct(e)] buckets
from (
select
array_agg(if(bucket = 'bucketa' and eligibilty = 1, element, null) ignore nulls) a,
array_agg(if(bucket = 'bucketb' and eligibilty = 1, element, null) ignore nulls) b,
array_agg(if(bucket = 'bucketc' and eligibilty = 1, element, null) ignore nulls) c,
array_agg(if(bucket = 'bucketd' and eligibilty = 1, element, null) ignore nulls) d,
array_agg(if(bucket = 'buckete' and eligibilty = 1, element, null) ignore nulls) e
from matrix
)
), columns_names as (
select array_agg(bucket order by bucket) cols
from (select distinct bucket from matrix)
), columns_index as (
select generate_array(0, array_length(cols) - 1) as arr
from columns_names
), buckets_combinations as (
select
(select array_agg(
case when n & (1<<pos) <> 0 then arr[offset(pos)] end
ignore nulls)
from unnest(generate_array(0, array_length(arr) - 1)) pos
) as combo
from columns_index cross join
unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
)
select
array(select cols[offset(i)] from columns_names, unnest(combo) i) winners
from (
select combo,
rank() over(order by (select count(distinct el) from unnest(val) v, unnest(v.a) el) desc, array_length(combo)) as rnk
from (
select any_value(c).combo, array_agg(buckets[offset(i)]) val
from buckets_combinations c, unnest(combo) i, buckets_elements b
group by format('%t', c)
)
)
where rnk = 1

if applied to sample data in y our question - output is

Sample Image

Note: I simply reused answer for previous question and just changed / adjusted buckets_elements and columns_names CTEs to reflect new schema. All the rest is exactly the same :o)

Generate all subset of an array bigquery

One approach to this problem is to generate all the integers between 1 and 2^ - 1. The bit pattern then represents all the combinations.

You can use bit comparisons to extract the combos:

with ar as (
select [1,2,3,4] as ar
)
select n,
(select array_agg(case when n & (1<<pos) <> 0
then ar.ar[offset(pos)]
end ignore nulls)
from ar cross join
unnest(generate_array(0, x.cnt - 1)) pos
) as combo
from ar cross join
(select count(*) as cnt
from ar cross join
unnest(ar.ar) x
) x cross join
unnest(generate_array(1, cast(power(2, x.cnt) - 1 as int64))) n

Finding X in sql bigquery

Below is for BigQuery Standard SQL

#standardSQL
WITH puzzle AS (
SELECT 'x1' x, 60 weight, 2620 target UNION ALL
SELECT 'x2', 226, 2620 UNION ALL
SELECT 'x3', 400, 2620 UNION ALL
SELECT 'x4', 554, 2620 UNION ALL
SELECT 'x5', 469, 2620 UNION ALL
SELECT 'x6', 278, 2620
), numbers AS (
SELECT num FROM (
SELECT DIV(ANY_VALUE(target), MIN(weight)) max_num
FROM puzzle
), UNNEST(GENERATE_ARRAY(1, max_num)) num
)
SELECT x1.num x1, x2.num x2, x3.num x3, x4.num x4, x5.num x5, x6.num x6,
(SELECT weight FROM puzzle WHERE x = 'x1') * x1.num +
(SELECT weight FROM puzzle WHERE x = 'x2') * x2.num +
(SELECT weight FROM puzzle WHERE x = 'x3') * x3.num +
(SELECT weight FROM puzzle WHERE x = 'x4') * x4.num +
(SELECT weight FROM puzzle WHERE x = 'x5') * x5.num +
(SELECT weight FROM puzzle WHERE x = 'x6') * x6.num AS result
FROM puzzle z,
numbers x1,
numbers x2,
numbers x3,
numbers x4,
numbers x5,
numbers x6
WHERE x1.num >= x2.num
AND x2.num >= x3.num
AND x3.num >= x4.num
AND x4.num >= x5.num
AND x5.num >= x6.num
ORDER BY ABS(target - result)
LIMIT 1

The output is

Row x1  x2  x3  x4  x5  x6  result   
1 4 3 1 1 1 1 2619

Note: above approach can relatively easy be adopted for dynamic number of parameters variables

Dynamic columns inbiquery

it is hard to answer this particular question without having whole context available - so I am using your previous question as a such - Fewest buckets to fit in the elements. Also to make it easier - I had to refactor initial solution that is based on use of CTEs - to split all into separate temp tables. So buckets_elements becomes one of such table and can be easily dynamically created

So, finally, consider below to make your code fully dynamic

create temp table columns_names as 
select array_agg(bucket order by bucket) cols
from (select distinct bucket from matrix)
;

execute immediate (
select '''
create temp table buckets_elements as
select array[''' || string_agg('''struct(col''' || offset || ''')''') || '''] buckets
from (
select ''' || string_agg('''
array_agg(if(bucket = "''' || col || '''" and eligibilty = 1, element, null) ignore nulls) col''' || offset , ', ') || '''
from matrix
);
'''
from columns_names, unnest(cols) col with offset
);

create temp table columns_index as
select generate_array(0, array_length(cols) - 1) as arr
from columns_names
;

create temp table buckets_combinations as
select
(select array_agg(
case when n & (1<<pos) <> 0 then arr[offset(pos)] end
ignore nulls)
from unnest(generate_array(0, array_length(arr) - 1)) pos
) as combo
from columns_index cross join
unnest(generate_array(1, cast(power(2, array_length(arr)) - 1 as int64))) n
;

create temp table temp1 as
select any_value(c).combo, array_agg(buckets[offset(i)]) val
from buckets_combinations c, unnest(combo) i, buckets_elements b
group by format('%t', c)
;

create temp table temp2 as
select combo,
rank() over(order by (select count(distinct el) from unnest(val) v, unnest(v.col0) el) desc, array_length(combo)) as rnk
from temp1
;

select array_agg(cols[offset(i)]) winners
from temp2,columns_names, unnest(combo) i
where rnk = 1
group by format('%t', combo)

As you can see - there is no any reference in code to neither buckets names nor their counts - so code is fully dynamic

If applied to sample data in your question - output is

Sample Image

How can I run a SQL Query with a list of String values using the WHERE [columnname] IN [values] format in ASP.NET?

Create your base sql statement as a format, and add the parameters dynamically, and then set the values in a loop.

String[] productCodes = { "ABC", "DEF", "GHI", "JKL" };
string sqlFormat = "SELECT PRODUCTNAME FROM PRODUCT WHERE PRODUCTCODE IN ({0})";
var @params = productCodes.Select((id, index) => String.Format("@id{0}", index)).ToArray();
var sql = String.Format(sqlFormat, string.Join(",", @params));

using(var cmd = new DbCommand(sql))
{
for (int i = 0; i < productCodes.Length; i++)
cmd.Parameters.Add(new Parameter(@params[i], DbType.String, productCodes[i]));
// execute query
}


Related Topics



Leave a reply



Submit