Unpivot Multiple Columns With Same Name in Pandas Dataframe

Unpivot multiple columns with same name in pandas dataframe

Try groupby with axis=1

df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]:
b pp
0 0.001464 5.0
2 0.001459 5.0
1 0.001853 6.0
3 0.001843 6.0

A fun way with wide_to_long

s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)

pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='\d+')
Out[342]:
pp b
index drop
0 0 5 0.001464
1 0 5 0.001459
0 1 6 0.001853
1 1 6 0.001843

Pandas: How to (cleanly) unpivot two columns with same category?

Use wide_to_long:

np.random.seed(123)
df_orig = pd.DataFrame(data=np.random.randint(255, size=(4,5)),
columns=['accuracy','time_a','time_b','memory_a', 'memory_b'])


df = (pd.wide_to_long(df_orig.reset_index(),
stubnames=['time','memory'],
i='index',
j='category',
sep='_',
suffix='\w+')
.reset_index(level=1)
.reset_index(drop=True)
.rename_axis(None))
print (df)
category accuracy time memory
0 a 254 109 66
1 a 98 230 83
2 a 123 57 225
3 a 113 126 73
4 b 254 126 220
5 b 98 17 106
6 b 123 214 96
7 b 113 47 32

Unpivot multiple variables Pandas Dataframe

You are close, need suffix='\w+' for get non-integers as suffixes:

new_df = (pd.wide_to_long(df, ['price', 'vol', 'flag'],
i=['time', 'prod'],
j='State',
sep='_',
suffix='\w+')
.reset_index())

print (new_df)
time prod State price vol flag
0 t1 A qld 4 11 1
1 t1 A nsw 7 73 0
2 t1 A vic 9 95 1
3 t2 B qld 3 43 1
4 t2 B nsw 4 44 1
5 t2 B vic 4 34 1
6 t3 C qld 6 232 1
7 t3 C nsw 7 657 0
8 t3 C vic 6 666 1
9 t4 D qld 3 234 1
10 t4 D nsw 3 53 1
11 t4 D vic 23 273 0
12 t5 E qld 8 42 0
13 t5 E nsw 5 785 0
14 t5 E vic 7 87 1

Another approach:

#convert all columns without separatot to MultiIndex
new_df = df.set_index(['time', 'prod'])
#split columns by separator
new_df.columns = new_df.columns.str.split('_', expand=True)
#reshape by stack
new_df = new_df.stack().reset_index().rename(columns={'level_2':'state'})

print (new_df)
time prod state flag price vol
0 t1 A nsw 0 7 73
1 t1 A qld 1 4 11
2 t1 A vic 1 9 95
3 t2 B nsw 1 4 44
4 t2 B qld 1 3 43
5 t2 B vic 1 4 34
6 t3 C nsw 0 7 657
7 t3 C qld 1 6 232
8 t3 C vic 1 6 666
9 t4 D nsw 1 3 53
10 t4 D qld 1 3 234
11 t4 D vic 0 23 273
12 t5 E nsw 0 5 785
13 t5 E qld 0 8 42
14 t5 E vic 1 7 87

Unpivot df columns to multiple columns and rows

You can try melt it and then split the variable column by _:

long_df = pd.melt(df, id_vars=['Country', 'Industry'])
long_df[['Year', 'Group_Type', 'Tags']] = long_df.variable.str.split('_', expand=True)

long_df.drop('variable', axis=1)
# Country Industry value Year Group_Type Tags
#0 US AB 0.00 2011 0-9 AF
#1 US AC 12.34 2011 0-9 AF
#2 UK AB 1.00 2011 0-9 AF
#3 UK AC 12.00 2011 0-9 AF
#4 US AB 0.00 2011 0-9 AP
#5 US AC 12.40 2011 0-9 AP
#6 UK AB 2.00 2011 0-9 AP
#7 UK AC 5.00 2011 0-9 AP

Python Pandas pivot one column and unpivot X columns

Try like this:

df.set_index(['Name', 'Food']).stack().unstack('Food')

Food burger hot dog pizza
Name
Bob 1/1/2018 2.0 1.5 0.0
2/1/2018 0.0 0.0 1.0
3/1/2018 2.0 1.5 0.0
4/1/2018 2.0 1.5 0.0
Mike 1/1/2018 0.0 0.0 1.0
2/1/2018 3.0 0.0 0.0
3/1/2018 0.0 0.0 1.0
4/1/2018 0.0 0.0 1.0

If formatting is an issue, just reset the index and then rename your columns to appropriate names:

df.set_index(['Name', 'Food']).stack().unstack('Food').reset_index().rename(columns={'level_1':'date'})

Food Name date burger hot dog pizza
0 Bob 1/1/2018 2.0 1.5 0.0
1 Bob 2/1/2018 0.0 0.0 1.0
2 Bob 3/1/2018 2.0 1.5 0.0
3 Bob 4/1/2018 2.0 1.5 0.0
4 Mike 1/1/2018 0.0 0.0 1.0
5 Mike 2/1/2018 3.0 0.0 0.0
6 Mike 3/1/2018 0.0 0.0 1.0
7 Mike 4/1/2018 0.0 0.0 1.0

Unpivot pandas DataFrame partly

If need Qty, Value to separate columns convert first columns to MultiIndex, so possible use Series.str.rsplit by last space to MultiIndex in columns, so last reshape by DataFrame.stack:

df = df.set_index(['Items','Description'])
df.columns = df.columns.str.rsplit(n=1, expand=True)
df = df.rename_axis(('Store number',None), axis=1).stack(0).reset_index()
print (df)
Items Description Store number Qty Value
0 item 1 Some item name Store 1 5 120
1 item 1 Some item name Store 2 7 240
2 item 2 Some other item Store 1 9 1234
3 item 2 Some other item Store 2 12 98

Convert/unpivot multiple columns to rows in Python Dataframe

Use wide_to_long, but first is necessary change columns names for cor_id columns with add last digit:

df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

Alternative is add 0 and remove missing rows with dropna:

df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()

print (df)
id cor_id mail
0 1 1.0 a@123
1 1 1.0 b@234
2 1 1.0 c@123
3 1 1.0 a@def
4 1 2.0 b@fgh
5 1 2.0 s@wer
6 1 2.0 b@ert
7 1 3.0 e@rty
8 1 3.0 c@asd
9 2 4.0 e@234
10 2 4.0 e@234
11 2 4.0 e@qwe
12 2 4.0 e@dfe
13 2 9.0 f@jfg
14 2 9.0 e@wer
15 2 9.0 g@wer
16 2 10.0 e@ert
17 2 10.0 r@ert

EDIT: If there is multiple columns like cor_id only add it to tuple for test by startswith and then change forward filling by all columns by list with ffill:

df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
id cor_id ad mail
0 1 1.0 23.0 a@123
1 1 1.0 23.0 b@234
2 1 1.0 23.0 c@123
3 1 2.0 24.0 a@def
4 1 2.0 24.0 b@fgh
5 1 2.0 24.0 c@asd
6 1 3.0 25.0 s@wer
7 1 3.0 25.0 b@ert
8 1 3.0 25.0 e@rty
9 2 4.0 33.0 e@234
10 2 4.0 33.0 e@234
11 2 4.0 33.0 e@qwe
12 2 9.0 34.0 e@dfe
13 2 9.0 34.0 f@jfg
14 2 9.0 34.0 r@ert
15 2 10.0 35.0 e@wer
16 2 10.0 35.0 g@wer
17 2 10.0 35.0 e@ert

unpivot columns into multiple columns and values in scala dataframe

I have found a solution by creating a new primary key column for joining and solution is provided below.

If anyone else have any other better approach, please share that as well.

val df = Seq(
(1,"2022-02-01",0,5,10,15,20,25,30),
(1,"2022-02-02",0,5,10,15,20,25,30),
(2,"2022-02-01",0,5,10,15,20,25,30),
(2,"2022-02-02",0,5,10,15,20,25,30)
).toDF("ID","DATE","TYPE","SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val df01 = df.withColumn("DATE", date_format(col("DATE"),"dd-MM-yyyy"))

val fullSig = List[String]("SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val fullSigList = fullSig.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig01 = List[String]("SIG_A","SIG_B","SIG_C")
val sig01List = sig01.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig02 = List[String]("SIG_AA","SIG_BB")
val sig02List = sig02.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig03 = List[String]("SIG_AAA")
val sig03List = sig03.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))


val unpiv01 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig01List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature1_name", $"feature.sig_value" as "feature1_value")
val unpiv001 = unpiv01
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature1_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")

val unpiv02 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig02List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature2_name", $"feature.sig_value" as "feature2_value")
val unpiv002 = unpiv02
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature2_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")

val unpiv03 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig03List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature3_name", $"feature.sig_value" as "feature3_value")
val unpiv003= unpiv03
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature3_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")

val joineddf = unpiv001.as("x1")
.join(unpiv002.as("x2"),$"x1.joinkey" === $"x2.joinkey", "outer")
.join(unpiv003.as("x3"),$"x1.joinkey" === $"x3.joinkey", "outer")
.select($"x1.ID" as "ID",$"x1.DATE" as "DATE",$"x1.TYPE" as "TYPE",$"feature1_name",$"feature1_value",$"feature2_name",$"feature2_value",$"feature3_name",$"feature3_value")

INPUT DATAFRAME:

+---+----------+----+-----+-----+-----+------+------+-------+
| ID| DATE|TYPE|SIG_A|SIG_B|SIG_C|SIG_AA|SIG_BB|SIG_AAA|
+---+----------+----+-----+-----+-----+------+------+-------+
| 1|01-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 1|02-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 2|01-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 2|02-02-2022| 0| 5| 10| 15| 20| 25| 30|
+---+----------+----+-----+-----+-----+------+------+-------+

OUTPUT DATAFRAME:

+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
| ID| DATE|TYPE|feature1_name|feature1_value|feature2_name|feature2_value|feature3_name|feature3_value|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
| 1|01-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 1|01-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 1|01-02-2022| 0| SIG_C| 15| null| null| null| null|
| 1|02-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 1|02-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 1|02-02-2022| 0| SIG_C| 15| null| null| null| null|
| 2|01-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 2|01-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 2|01-02-2022| 0| SIG_C| 15| null| null| null| null|
| 2|02-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 2|02-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 2|02-02-2022| 0| SIG_C| 15| null| null| null| null|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+



Related Topics



Leave a reply



Submit