Unpivot multiple columns with same name in pandas dataframe
Try groupby
with axis=1
df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]:
b pp
0 0.001464 5.0
2 0.001459 5.0
1 0.001853 6.0
3 0.001843 6.0
A fun way with wide_to_long
s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)
pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='\d+')
Out[342]:
pp b
index drop
0 0 5 0.001464
1 0 5 0.001459
0 1 6 0.001853
1 1 6 0.001843
Pandas: How to (cleanly) unpivot two columns with same category?
Use wide_to_long
:
np.random.seed(123)
df_orig = pd.DataFrame(data=np.random.randint(255, size=(4,5)),
columns=['accuracy','time_a','time_b','memory_a', 'memory_b'])
df = (pd.wide_to_long(df_orig.reset_index(),
stubnames=['time','memory'],
i='index',
j='category',
sep='_',
suffix='\w+')
.reset_index(level=1)
.reset_index(drop=True)
.rename_axis(None))
print (df)
category accuracy time memory
0 a 254 109 66
1 a 98 230 83
2 a 123 57 225
3 a 113 126 73
4 b 254 126 220
5 b 98 17 106
6 b 123 214 96
7 b 113 47 32
Unpivot multiple variables Pandas Dataframe
You are close, need suffix='\w+'
for get non-integers as suffixes:
new_df = (pd.wide_to_long(df, ['price', 'vol', 'flag'],
i=['time', 'prod'],
j='State',
sep='_',
suffix='\w+')
.reset_index())
print (new_df)
time prod State price vol flag
0 t1 A qld 4 11 1
1 t1 A nsw 7 73 0
2 t1 A vic 9 95 1
3 t2 B qld 3 43 1
4 t2 B nsw 4 44 1
5 t2 B vic 4 34 1
6 t3 C qld 6 232 1
7 t3 C nsw 7 657 0
8 t3 C vic 6 666 1
9 t4 D qld 3 234 1
10 t4 D nsw 3 53 1
11 t4 D vic 23 273 0
12 t5 E qld 8 42 0
13 t5 E nsw 5 785 0
14 t5 E vic 7 87 1
Another approach:
#convert all columns without separatot to MultiIndex
new_df = df.set_index(['time', 'prod'])
#split columns by separator
new_df.columns = new_df.columns.str.split('_', expand=True)
#reshape by stack
new_df = new_df.stack().reset_index().rename(columns={'level_2':'state'})
print (new_df)
time prod state flag price vol
0 t1 A nsw 0 7 73
1 t1 A qld 1 4 11
2 t1 A vic 1 9 95
3 t2 B nsw 1 4 44
4 t2 B qld 1 3 43
5 t2 B vic 1 4 34
6 t3 C nsw 0 7 657
7 t3 C qld 1 6 232
8 t3 C vic 1 6 666
9 t4 D nsw 1 3 53
10 t4 D qld 1 3 234
11 t4 D vic 0 23 273
12 t5 E nsw 0 5 785
13 t5 E qld 0 8 42
14 t5 E vic 1 7 87
Unpivot df columns to multiple columns and rows
You can try melt
it and then split
the variable column by _
:
long_df = pd.melt(df, id_vars=['Country', 'Industry'])
long_df[['Year', 'Group_Type', 'Tags']] = long_df.variable.str.split('_', expand=True)
long_df.drop('variable', axis=1)
# Country Industry value Year Group_Type Tags
#0 US AB 0.00 2011 0-9 AF
#1 US AC 12.34 2011 0-9 AF
#2 UK AB 1.00 2011 0-9 AF
#3 UK AC 12.00 2011 0-9 AF
#4 US AB 0.00 2011 0-9 AP
#5 US AC 12.40 2011 0-9 AP
#6 UK AB 2.00 2011 0-9 AP
#7 UK AC 5.00 2011 0-9 AP
Python Pandas pivot one column and unpivot X columns
Try like this:
df.set_index(['Name', 'Food']).stack().unstack('Food')
Food burger hot dog pizza
Name
Bob 1/1/2018 2.0 1.5 0.0
2/1/2018 0.0 0.0 1.0
3/1/2018 2.0 1.5 0.0
4/1/2018 2.0 1.5 0.0
Mike 1/1/2018 0.0 0.0 1.0
2/1/2018 3.0 0.0 0.0
3/1/2018 0.0 0.0 1.0
4/1/2018 0.0 0.0 1.0
If formatting is an issue, just reset the index and then rename your columns to appropriate names:
df.set_index(['Name', 'Food']).stack().unstack('Food').reset_index().rename(columns={'level_1':'date'})
Food Name date burger hot dog pizza
0 Bob 1/1/2018 2.0 1.5 0.0
1 Bob 2/1/2018 0.0 0.0 1.0
2 Bob 3/1/2018 2.0 1.5 0.0
3 Bob 4/1/2018 2.0 1.5 0.0
4 Mike 1/1/2018 0.0 0.0 1.0
5 Mike 2/1/2018 3.0 0.0 0.0
6 Mike 3/1/2018 0.0 0.0 1.0
7 Mike 4/1/2018 0.0 0.0 1.0
Unpivot pandas DataFrame partly
If need Qty, Value
to separate columns convert first columns to MultiIndex
, so possible use Series.str.rsplit
by last space to MultiIndex in columns
, so last reshape by DataFrame.stack
:
df = df.set_index(['Items','Description'])
df.columns = df.columns.str.rsplit(n=1, expand=True)
df = df.rename_axis(('Store number',None), axis=1).stack(0).reset_index()
print (df)
Items Description Store number Qty Value
0 item 1 Some item name Store 1 5 120
1 item 1 Some item name Store 2 7 240
2 item 2 Some other item Store 1 9 1234
3 item 2 Some other item Store 2 12 98
Convert/unpivot multiple columns to rows in Python Dataframe
Use wide_to_long
, but first is necessary change columns names for cor_id
columns with add last digit:
df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()
Alternative is add 0
and remove missing rows with dropna
:
df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
id cor_id mail
0 1 1.0 a@123
1 1 1.0 b@234
2 1 1.0 c@123
3 1 1.0 a@def
4 1 2.0 b@fgh
5 1 2.0 s@wer
6 1 2.0 b@ert
7 1 3.0 e@rty
8 1 3.0 c@asd
9 2 4.0 e@234
10 2 4.0 e@234
11 2 4.0 e@qwe
12 2 4.0 e@dfe
13 2 9.0 f@jfg
14 2 9.0 e@wer
15 2 9.0 g@wer
16 2 10.0 e@ert
17 2 10.0 r@ert
EDIT: If there is multiple columns like cor_id
only add it to tuple for test by startswith
and then change forward filling by all columns by list
with ffill
:
df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
id cor_id ad mail
0 1 1.0 23.0 a@123
1 1 1.0 23.0 b@234
2 1 1.0 23.0 c@123
3 1 2.0 24.0 a@def
4 1 2.0 24.0 b@fgh
5 1 2.0 24.0 c@asd
6 1 3.0 25.0 s@wer
7 1 3.0 25.0 b@ert
8 1 3.0 25.0 e@rty
9 2 4.0 33.0 e@234
10 2 4.0 33.0 e@234
11 2 4.0 33.0 e@qwe
12 2 9.0 34.0 e@dfe
13 2 9.0 34.0 f@jfg
14 2 9.0 34.0 r@ert
15 2 10.0 35.0 e@wer
16 2 10.0 35.0 g@wer
17 2 10.0 35.0 e@ert
unpivot columns into multiple columns and values in scala dataframe
I have found a solution by creating a new primary key column for joining and solution is provided below.
If anyone else have any other better approach, please share that as well.
val df = Seq(
(1,"2022-02-01",0,5,10,15,20,25,30),
(1,"2022-02-02",0,5,10,15,20,25,30),
(2,"2022-02-01",0,5,10,15,20,25,30),
(2,"2022-02-02",0,5,10,15,20,25,30)
).toDF("ID","DATE","TYPE","SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val df01 = df.withColumn("DATE", date_format(col("DATE"),"dd-MM-yyyy"))
val fullSig = List[String]("SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val fullSigList = fullSig.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))
val sig01 = List[String]("SIG_A","SIG_B","SIG_C")
val sig01List = sig01.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))
val sig02 = List[String]("SIG_AA","SIG_BB")
val sig02List = sig02.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))
val sig03 = List[String]("SIG_AAA")
val sig03List = sig03.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))
val unpiv01 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig01List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature1_name", $"feature.sig_value" as "feature1_value")
val unpiv001 = unpiv01
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature1_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")
val unpiv02 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig02List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature2_name", $"feature.sig_value" as "feature2_value")
val unpiv002 = unpiv02
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature2_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")
val unpiv03 = df01
.select($"ID",$"DATE",$"TYPE",explode(array(sig03List:_*))as "feature")
.select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature3_name", $"feature.sig_value" as "feature3_value")
val unpiv003= unpiv03
.withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature3_name")))
.withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
.drop("row_num")
val joineddf = unpiv001.as("x1")
.join(unpiv002.as("x2"),$"x1.joinkey" === $"x2.joinkey", "outer")
.join(unpiv003.as("x3"),$"x1.joinkey" === $"x3.joinkey", "outer")
.select($"x1.ID" as "ID",$"x1.DATE" as "DATE",$"x1.TYPE" as "TYPE",$"feature1_name",$"feature1_value",$"feature2_name",$"feature2_value",$"feature3_name",$"feature3_value")
INPUT DATAFRAME:
+---+----------+----+-----+-----+-----+------+------+-------+
| ID| DATE|TYPE|SIG_A|SIG_B|SIG_C|SIG_AA|SIG_BB|SIG_AAA|
+---+----------+----+-----+-----+-----+------+------+-------+
| 1|01-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 1|02-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 2|01-02-2022| 0| 5| 10| 15| 20| 25| 30|
| 2|02-02-2022| 0| 5| 10| 15| 20| 25| 30|
+---+----------+----+-----+-----+-----+------+------+-------+
OUTPUT DATAFRAME:
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
| ID| DATE|TYPE|feature1_name|feature1_value|feature2_name|feature2_value|feature3_name|feature3_value|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
| 1|01-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 1|01-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 1|01-02-2022| 0| SIG_C| 15| null| null| null| null|
| 1|02-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 1|02-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 1|02-02-2022| 0| SIG_C| 15| null| null| null| null|
| 2|01-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 2|01-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 2|01-02-2022| 0| SIG_C| 15| null| null| null| null|
| 2|02-02-2022| 0| SIG_A| 5| SIG_AA| 20| SIG_AAA| 30|
| 2|02-02-2022| 0| SIG_B| 10| SIG_BB| 25| null| null|
| 2|02-02-2022| 0| SIG_C| 15| null| null| null| null|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
Related Topics
Run Multiple Python File Concurrently
Python - Outputting Variables to Txt File
Converting Exponential to Float
Plot Two Histograms on Single Chart With Matplotlib
How to Get the Name of an Object
How to Convert Signed to Unsigned Integer in Python
Convert HTML String to an Image in Python
Changing Presence Discord Status
Cannot Convert the Series to <Class 'Int''>
Read CSV from Google Cloud Storage to Pandas Dataframe
Python | Make the Percentage of a List
How to Convert .Dat to .Csv Using Python
Hiding Raw_Input() Password Input
Convert SQL Result to List Python
How to Redeem Nitro Gifts Automatically With Discord.Py (Self-Bot)