Pivot Table and Concatenate Columns

Pivot and concatenate Power Query

Steps:

1- Group by the columns and use the All rows operation

Sample Image

2- Add a custom column refering to the AllRows column of the previous step and the column you'd like to concatenate

Sample Image

3- Use the Extract values on the custom column

Sample Image

4- Remove other columns

Sample Image

M code:

let
Source = Excel.CurrentWorkbook(){[Name="Table1"]}[Content],
ChangeType = Table.TransformColumnTypes(Source,{{"DOC NUM", type text}, {"TRAN", type text}, {"TYPE", type text}, {"ACCS", type text}, {"Label", type text}}),
GroupAll = Table.Group(ChangeType, {"DOC NUM", "TRAN", "TYPE"}, {{"AllRows", each _, type table [DOC NUM=nullable text, #"TRAN"=nullable text, #"TYPE"=nullable text, #"ACCS"=nullable text, #"Amount"=number, #"Label"=nullable text]}}),
AddCustom = Table.AddColumn(GroupAll, "Custom", each [AllRows][Label]),
ExpandCol = Table.TransformColumns(AddCustom, {"Custom", each Text.Combine(List.Transform(_, Text.From)), type text}),
RemoveOtherCols = Table.SelectColumns(ExpandCol,{"DOC NUM", "TRAN", "TYPE", "Custom"})
in
RemoveOtherCols

Let me know if it works

pandas pivot data frame, make values of one column as a column and concatenate values of another column

You can use .pivot_table() with aggfunc (aggregate function) to join the values of column type with |, as follows:

df_out = (df.pivot_table(index=['id', 'name'], columns='term_type', values='term', aggfunc='|'.join)
.rename_axis(columns=None)
).reset_index()

Result:

print(df_out)

id name bio hist
0 1 alpha NaN delta9|delta10
1 2 bravo alpha1 delta1

Concatenate columns in Pivot View

I would skip the PIVOT altogether and just use FOR XML_PATH, like this:

DECLARE @DynamicPivotQuery AS NVARCHAR(MAX)

--Get distinct values of the PIVOT Column
SELECT @DynamicPivotQuery= ISNULL(@DynamicPivotQuery + ',','select mdcode,')
+ 'stuff((select '',''+actual_date from tpc where mdcode=t.mdcode and act_desc = ''' + ACT_DESC + ''' for xml path(''''),type).value(''.'',''varchar(max)''),1,1,'''') '
+ QUOTENAME(ACT_DESC)
FROM (SELECT DISTINCT ACT_DESC FROM tpc) AS des

select @DynamicPivotQuery = @DynamicPivotQuery + 'from tpc t group by mdcode'
EXEC sp_executesql @DynamicPivotQuery

The dynamic query generates a query like this:

select mdcode,
stuff(
(
select ','+actual_date
from tpc where mdcode=t.mdcode and act_desc = 'sample1'
for xml path(''),type
).value('.','varchar(max)')
,1,1,'') sample1,
stuff(
(
select ','+actual_date
from tpc where mdcode=t.mdcode and act_desc = 'sample2'
for xml path(''),type
).value('.','varchar(max)')
,1,1,'') sample2
from tpc t
group by mdcode;

The SQL Fiddle demonstrates the static query and the dynamic one

Pivot Table and Concatenate Columns

SQL Server 2005 offers a very useful PIVOT and UNPIVOT operator which allow you to make this code maintenance-free using PIVOT and some code generation/dynamic SQL

/*
CREATE TABLE [dbo].[stackoverflow_159456](
[ID] [int] NOT NULL,
[TYPE] [char](1) NOT NULL,
[SUBTYPE] [char](1) NOT NULL,
[COUNT] [int] NOT NULL,
[MONTH] [datetime] NOT NULL
) ON [PRIMARY]
*/

DECLARE @sql AS varchar(max)
DECLARE @pivot_list AS varchar(max) -- Leave NULL for COALESCE technique
DECLARE @select_list AS varchar(max) -- Leave NULL for COALESCE technique

SELECT @pivot_list = COALESCE(@pivot_list + ', ', '') + '[' + PIVOT_CODE + ']'
,@select_list = COALESCE(@select_list + ', ', '') + 'ISNULL([' + PIVOT_CODE + '], 0) AS [' + PIVOT_CODE + ']'
FROM (
SELECT DISTINCT [TYPE] + '_' + SUBTYPE AS PIVOT_CODE
FROM stackoverflow_159456
) AS PIVOT_CODES

SET @sql = '
;WITH p AS (
SELECT ID, [MONTH], [TYPE] + ''_'' + SUBTYPE AS PIVOT_CODE, SUM([COUNT]) AS [COUNT]
FROM stackoverflow_159456
GROUP BY ID, [MONTH], [TYPE] + ''_'' + SUBTYPE
)
SELECT ID, [MONTH], ' + @select_list + '
FROM p
PIVOT (
SUM([COUNT])
FOR PIVOT_CODE IN (
' + @pivot_list + '
)
) AS pvt
'

EXEC (@sql)

Pandas pivot_table-like output, vertically concatenating multiple column values from group

This isn't a pivot table, but a funky groupby, or maybe two separate groupby's.

(I wouldn't try to use .agg() either because you want to concatenate the other columns in-order, all together, but .agg() is really pedantic about forcing you to define an individual aggregate function for each column, which here would be a pain.)

Taking the last 'y' value in a group is easy:

df.groupby('id').agg({'y': lambda s: s.iloc[-1]})

# where we don't use .tail() to avoid the current bug on a series which throws "ValueError: Must produce aggregated value"

Now to vertically concatenate the rows in the group consecutively, for all the other columns:

  • we actually don't even need pd.concat([...], axis=1) like I thought we would

  • we can apply this solution inside the df.groupby('id').apply(lambda g: g.drop(columns=['id','y']).values.flatten())

  • first, explicitly specify which columns you do want included:

    df[['id','val1', 'val2', 'val3']].groupby('id').apply(lambda g: g.values.flatten())

id
1 [3, 1, 2, 1, 2, 4, 4, 2, 6]
2 [3, 1, 4, 2, 2, 2, 4, 2, 4]
3 [3, 3, 4, 3, 2, 4, 6, 3, 3]

or if you prefer, you can move the .drop('y') to the front:

df.drop(columns='y').groupby('id').apply(lambda g: g.values.flatten()

We can't legally concatenate to have duplicate column names in the output as @DocZerø pointed out, your example is illegal pandas syntax. You need to figure out how you want to add a prefix/suffix/other name-mangling to the column names.

Minor note: pandas .values accessor is discouraged and will in future be deprecated, we're supposed to start using to_numpy() or .array.

How to combine columns in pandas pivot table?

DataFrame.reorder_levels will make it easy for you.

Here is some sample data:

import numpy as np
import pandas as pd


index = pd.Index(["asp", "chemscore", "goldscore", "plp"], name="dock_func")
columns = pd.MultiIndex.from_product(
[index, pd.Index(["best", "fisrt"], name="tag"), ("mean", "std")]
)

df = pd.DataFrame(
np.random.random(size=(4, 16)),
index=index,
columns=columns,
).round(1)

df looks like:

dock_func  asp                 chemscore                 goldscore                  plp
tag best fisrt best fisrt best fisrt best fisrt
mean std mean std mean std mean std mean std mean std mean std mean std
dock_func
asp 0.5 0.6 0.4 0.2 0.7 0.7 0.8 0.1 0.2 0.5 0.6 0.7 0.5 0.2 0.2 0.7
chemscore 0.0 0.7 0.9 0.2 0.3 0.3 0.4 0.8 0.3 0.4 0.2 0.8 0.5 0.5 0.4 0.2
goldscore 0.5 0.7 0.8 0.0 0.2 0.8 0.1 0.2 0.6 0.1 0.4 0.2 0.8 0.2 0.8 0.3
plp 1.0 0.6 0.6 0.8 0.8 0.6 0.3 1.0 0.7 0.2 0.8 0.2 0.2 0.2 0.7 0.2

Then just run the following:

df = df.reorder_levels([2, 0, 1], axis=1).astype(str)
df = df["mean"] + "(" + df["std"] + ")"

and df is:

dock_func       asp           chemscore           goldscore                 plp
tag best fisrt best fisrt best fisrt best fisrt
dock_func
asp 0.5(0.6) 0.4(0.2) 0.7(0.7) 0.8(0.1) 0.2(0.5) 0.6(0.7) 0.5(0.2) 0.2(0.7)
chemscore 0.0(0.7) 0.9(0.2) 0.3(0.3) 0.4(0.8) 0.3(0.4) 0.2(0.8) 0.5(0.5) 0.4(0.2)
goldscore 0.5(0.7) 0.8(0.0) 0.2(0.8) 0.1(0.2) 0.6(0.1) 0.4(0.2) 0.8(0.2) 0.8(0.3)
plp 1.0(0.6) 0.6(0.8) 0.8(0.6) 0.3(1.0) 0.7(0.2) 0.8(0.2) 0.2(0.2) 0.7(0.2)

Pivot with Column Concatenation in SQL Server

Just use STUFF with XML to Display Data in Single Row (No Need to PIVOT):

SELECT P.PROJECT_NAME,
P.PROJECT_TYPE,
[EXEC_TYPE] = STUFF(
(
SELECT
'-'+PROJECT_TYPE
FROM PROJECT_DETAILS
WHERE PROJECT_NAME = P.PROJECT_NAME FOR XML PATH('')
), 1, 1, ''),
P.TOTAL_HOURS
FROM PROJECT_DETAILS P;

Output :

PROEJCT_NAME    PROJECT_TYPE    EXEC_TYPE   TOTAL_HRS
ProjectA AU AU-SGP-NZ 100
ProjectA SGP AU-SGP-NZ 50
ProjectA NZ AU-SGP-NZ 75
ProjectB US US-CAN 200
ProjectB CAN US-CAN 100
ProjectC JP JP 120
ProjectD IND IND-CH 100
ProjectD CH IND-CH 80
ProjectE RSA RSA-KEN 90
ProjectE KEN RSA-KEN 30


Related Topics



Leave a reply



Submit