Athena Presto - Multiple Columns from Long to Wide

athena presto - multiple columns from long to wide

You can use window functions and conditional aggregation. This requires that you know in advance the possible letters, and the maximum rows per id/letter tuple:

select
id,
max(case when letter = 'a' and rn = 1 then value end) a_1,
max(case when letter = 'a' and rn = 2 then value end) a_2,
max(case when letter = 'a' and rn = 3 then value end) a_3,
max(case when letter = 'b' and rn = 1 then value end) b_1,
max(case when letter = 'b' and rn = 2 then value end) b_2,
max(case when letter = 'b' and rn = 3 then value end) b_3,
max(case when letter = 'c' and rn = 1 then value end) c_1,
max(case when letter = 'c' and rn = 2 then value end) c_2,
max(case when letter = 'c' and rn = 3 then value end) c_3
from (
select
t.*,
row_number() over(partition by id, letter order by number) rn
from mytable t
) t
group by id

Actually, if the numbers are always 1, 2, 3, then you don't even need the window function:

select
id,
max(case when letter = 'a' and number = 1 then value end) a_1,
max(case when letter = 'a' and number = 2 then value end) a_2,
max(case when letter = 'a' and number = 3 then value end) a_3,
max(case when letter = 'b' and number = 1 then value end) b_1,
max(case when letter = 'b' and number = 2 then value end) b_2,
max(case when letter = 'b' and number = 3 then value end) b_3,
max(case when letter = 'c' and number = 1 then value end) c_1,
max(case when letter = 'c' and number = 2 then value end) c_2,
max(case when letter = 'c' and number = 3 then value end) c_3
from mytable t
group by id

How to compare a column against every other column in a SQL query with a wide table?

I know this is not the answer you probably are looking for but this is does not seem like something that should be done via Athena/SQL/Presto. The needing thousands of custom columns is a big red flag.

This sounds more like a job for a Spark Job which could be run in AWS Glue via an ETL Job.

Since your data is already in Athena, it should already be cataloged in Glue and you can use a GlueContext in spark to load your data frame directly from that datasource.

Spark jobs can be done in Python (via pyspark) or Scala. Creating these coefficient columns via a code loop and then writing them out to another file shouldn't be a very complicated script.

Assuming you are unfamiliar with most of this, it may be good to go through this example/tutorial:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html

athena - dynamically pivot rows to column

No, there is no way to write a query that results in different number of columns depending on the data. The columns must be known before query execution starts.

SQL: Convert a Wide Table to Narrow Table

If your database supports lateral join and the values() row constructor, then you can do:

select x.user_id, x.currency
from mytable t
cross join lateral (values(user_id, 'USD', usd), (user_id, 'EUR', eur), (user_id, 'CAD', cad)) x(user_id, currency, val)
where x.val= 1

Some databases implement the lateral join with cross apply instead of cross join lateral.

A more portable approach is union all. This is less efficient since it requires multiple table scans:

select user_id, 'USD'currency from mytable where usd = 1
union all select user_id, 'EUR' from mytable where eur = 1
union all select user_id, 'CAD' from mytable where cad = 1

How can I unnest a JSON field keeping them on the same records?

If you have a pre-defined lists of currencies (and a valid JSON array), you can unnest, then use conditional aggregation:

select 
t.product,
max(case when x.obj.validInRegion = 'Netherlands' then x.obj.priceCurrency) currencyNL,
max(case when x.obj.validInRegion = 'Netherlands' then x.obj.price) priceNL,
max(case when x.obj.validInRegion = 'Great Britain' then x.obj.priceCurrency) currencyGB,
max(case when x.obj.validInRegion = 'Great Britain' then x.obj.price) priceGB,
...
from mytable t
cross join unnest(t.js_column) as x(obj)
group by t.product


Related Topics



Leave a reply



Submit