athena presto - multiple columns from long to wide
You can use window functions and conditional aggregation. This requires that you know in advance the possible letters, and the maximum rows per id/letter tuple:
select
id,
max(case when letter = 'a' and rn = 1 then value end) a_1,
max(case when letter = 'a' and rn = 2 then value end) a_2,
max(case when letter = 'a' and rn = 3 then value end) a_3,
max(case when letter = 'b' and rn = 1 then value end) b_1,
max(case when letter = 'b' and rn = 2 then value end) b_2,
max(case when letter = 'b' and rn = 3 then value end) b_3,
max(case when letter = 'c' and rn = 1 then value end) c_1,
max(case when letter = 'c' and rn = 2 then value end) c_2,
max(case when letter = 'c' and rn = 3 then value end) c_3
from (
select
t.*,
row_number() over(partition by id, letter order by number) rn
from mytable t
) t
group by id
Actually, if the number
s are always 1
, 2
, 3
, then you don't even need the window function:
select
id,
max(case when letter = 'a' and number = 1 then value end) a_1,
max(case when letter = 'a' and number = 2 then value end) a_2,
max(case when letter = 'a' and number = 3 then value end) a_3,
max(case when letter = 'b' and number = 1 then value end) b_1,
max(case when letter = 'b' and number = 2 then value end) b_2,
max(case when letter = 'b' and number = 3 then value end) b_3,
max(case when letter = 'c' and number = 1 then value end) c_1,
max(case when letter = 'c' and number = 2 then value end) c_2,
max(case when letter = 'c' and number = 3 then value end) c_3
from mytable t
group by id
How to compare a column against every other column in a SQL query with a wide table?
I know this is not the answer you probably are looking for but this is does not seem like something that should be done via Athena/SQL/Presto. The needing thousands of custom columns is a big red flag.
This sounds more like a job for a Spark Job which could be run in AWS Glue via an ETL Job.
Since your data is already in Athena, it should already be cataloged in Glue and you can use a GlueContext in spark to load your data frame directly from that datasource.
Spark jobs can be done in Python (via pyspark) or Scala. Creating these coefficient columns via a code loop and then writing them out to another file shouldn't be a very complicated script.
Assuming you are unfamiliar with most of this, it may be good to go through this example/tutorial:
https://docs.aws.amazon.com/glue/latest/dg/aws-glue-programming-python-samples-legislators.html
athena - dynamically pivot rows to column
No, there is no way to write a query that results in different number of columns depending on the data. The columns must be known before query execution starts.
SQL: Convert a Wide Table to Narrow Table
If your database supports lateral join and the values()
row constructor, then you can do:
select x.user_id, x.currency
from mytable t
cross join lateral (values(user_id, 'USD', usd), (user_id, 'EUR', eur), (user_id, 'CAD', cad)) x(user_id, currency, val)
where x.val= 1
Some databases implement the lateral join with cross apply
instead of cross join lateral
.
A more portable approach is union all
. This is less efficient since it requires multiple table scans:
select user_id, 'USD'currency from mytable where usd = 1
union all select user_id, 'EUR' from mytable where eur = 1
union all select user_id, 'CAD' from mytable where cad = 1
How can I unnest a JSON field keeping them on the same records?
If you have a pre-defined lists of currencies (and a valid JSON array), you can unnest
, then use conditional aggregation:
select
t.product,
max(case when x.obj.validInRegion = 'Netherlands' then x.obj.priceCurrency) currencyNL,
max(case when x.obj.validInRegion = 'Netherlands' then x.obj.price) priceNL,
max(case when x.obj.validInRegion = 'Great Britain' then x.obj.priceCurrency) currencyGB,
max(case when x.obj.validInRegion = 'Great Britain' then x.obj.price) priceGB,
...
from mytable t
cross join unnest(t.js_column) as x(obj)
group by t.product
Related Topics
How to Update in SQLite Using a Left Join to Select Candidate Rows
Sp Taking 15 Minutes, But the Same Query When Executed Returns Results in 1-2 Minutes
How to Create Delphi 4 Structure to Map Column Names in Xls to Column Names in SQL
Access/SQL - Too Few Parameters
Join One Row to Multiple Rows in Another Table
Select All Threads and Order by the Latest One
Postgresql - Repeating Rows from Limit Offset
SQL Server Output Parameter Issue
Counting Number of Joined Rows in Left Join
Select Something That Has More/Less Than X Character
Mysql: Select N Rows, But with Only Unique Values in One Column
What Is the Optimal Way to Compare Dates in Microsoft SQL Server
Implementing a Total Order Ranking in Postgresql 8.3