Unpivot Multiple Columns With Same Name in Pandas Dataframe

Unpivot multiple columns with same name in pandas dataframe

Try groupby with axis=1

df.groupby(df.columns.values, axis=1).agg(lambda x: x.values.tolist()).sum().apply(pd.Series).T.sort_values('pp')
Out[320]: 
          b   pp
0  0.001464  5.0
2  0.001459  5.0
1  0.001853  6.0
3  0.001843  6.0

A fun way with wide_to_long

s=pd.Series(df.columns)
df.columns=df.columns+s.groupby(s).cumcount().astype(str)

pd.wide_to_long(df.reset_index(),stubnames=['pp','b'],i='index',j='drop',suffix='\d+')
Out[342]: 
            pp         b
index drop              
0     0      5  0.001464
1     0      5  0.001459
0     1      6  0.001853
1     1      6  0.001843

Pandas: How to (cleanly) unpivot two columns with same category?

Use wide_to_long:

np.random.seed(123)
df_orig = pd.DataFrame(data=np.random.randint(255, size=(4,5)),
                       columns=['accuracy','time_a','time_b','memory_a', 'memory_b'])


df = (pd.wide_to_long(df_orig.reset_index(), 
                     stubnames=['time','memory'],
                     i='index',
                     j='category',
                     sep='_',
                     suffix='\w+')
          .reset_index(level=1)
          .reset_index(drop=True)
          .rename_axis(None))
print (df)
  category  accuracy  time  memory
0        a       254   109      66
1        a        98   230      83
2        a       123    57     225
3        a       113   126      73
4        b       254   126     220
5        b        98    17     106
6        b       123   214      96
7        b       113    47      32

Unpivot multiple variables Pandas Dataframe

You are close, need suffix='\w+' for get non-integers as suffixes:

new_df = (pd.wide_to_long(df, ['price', 'vol', 'flag'],
                         i=['time', 'prod'],
                         j='State', 
                         sep='_', 
                         suffix='\w+')
             .reset_index())
    
print (new_df)
   time prod State  price  vol  flag
0    t1    A   qld      4   11     1
1    t1    A   nsw      7   73     0
2    t1    A   vic      9   95     1
3    t2    B   qld      3   43     1
4    t2    B   nsw      4   44     1
5    t2    B   vic      4   34     1
6    t3    C   qld      6  232     1
7    t3    C   nsw      7  657     0
8    t3    C   vic      6  666     1
9    t4    D   qld      3  234     1
10   t4    D   nsw      3   53     1
11   t4    D   vic     23  273     0
12   t5    E   qld      8   42     0
13   t5    E   nsw      5  785     0
14   t5    E   vic      7   87     1

Another approach:

#convert all columns without separatot to MultiIndex
new_df = df.set_index(['time', 'prod'])
#split columns by separator
new_df.columns = new_df.columns.str.split('_', expand=True)
#reshape by stack
new_df = new_df.stack().reset_index().rename(columns={'level_2':'state'})
    
print (new_df)
   time prod state  flag  price  vol
0    t1    A   nsw     0      7   73
1    t1    A   qld     1      4   11
2    t1    A   vic     1      9   95
3    t2    B   nsw     1      4   44
4    t2    B   qld     1      3   43
5    t2    B   vic     1      4   34
6    t3    C   nsw     0      7  657
7    t3    C   qld     1      6  232
8    t3    C   vic     1      6  666
9    t4    D   nsw     1      3   53
10   t4    D   qld     1      3  234
11   t4    D   vic     0     23  273
12   t5    E   nsw     0      5  785
13   t5    E   qld     0      8   42
14   t5    E   vic     1      7   87

Unpivot df columns to multiple columns and rows

You can try melt it and then split the variable column by _:

long_df = pd.melt(df, id_vars=['Country', 'Industry'])
long_df[['Year', 'Group_Type', 'Tags']] = long_df.variable.str.split('_', expand=True)

long_df.drop('variable', axis=1)
#  Country Industry  value  Year Group_Type Tags
#0      US       AB   0.00  2011        0-9   AF
#1      US       AC  12.34  2011        0-9   AF
#2      UK       AB   1.00  2011        0-9   AF
#3      UK       AC  12.00  2011        0-9   AF
#4      US       AB   0.00  2011        0-9   AP
#5      US       AC  12.40  2011        0-9   AP
#6      UK       AB   2.00  2011        0-9   AP
#7      UK       AC   5.00  2011        0-9   AP

Python Pandas pivot one column and unpivot X columns

Try like this:

df.set_index(['Name', 'Food']).stack().unstack('Food')

Food           burger  hot dog  pizza
Name                                 
Bob  1/1/2018     2.0      1.5    0.0
     2/1/2018     0.0      0.0    1.0
     3/1/2018     2.0      1.5    0.0
     4/1/2018     2.0      1.5    0.0
Mike 1/1/2018     0.0      0.0    1.0
     2/1/2018     3.0      0.0    0.0
     3/1/2018     0.0      0.0    1.0
     4/1/2018     0.0      0.0    1.0

If formatting is an issue, just reset the index and then rename your columns to appropriate names:

df.set_index(['Name', 'Food']).stack().unstack('Food').reset_index().rename(columns={'level_1':'date'})

Food  Name      date  burger  hot dog  pizza
0      Bob  1/1/2018     2.0      1.5    0.0
1      Bob  2/1/2018     0.0      0.0    1.0
2      Bob  3/1/2018     2.0      1.5    0.0
3      Bob  4/1/2018     2.0      1.5    0.0
4     Mike  1/1/2018     0.0      0.0    1.0
5     Mike  2/1/2018     3.0      0.0    0.0
6     Mike  3/1/2018     0.0      0.0    1.0
7     Mike  4/1/2018     0.0      0.0    1.0

Unpivot pandas DataFrame partly

If need Qty, Value to separate columns convert first columns to MultiIndex, so possible use Series.str.rsplit by last space to MultiIndex in columns, so last reshape by DataFrame.stack:

df = df.set_index(['Items','Description'])
df.columns = df.columns.str.rsplit(n=1, expand=True)
df = df.rename_axis(('Store number',None), axis=1).stack(0).reset_index()
print (df)
    Items      Description Store number  Qty  Value
0  item 1   Some item name      Store 1    5    120
1  item 1   Some item name      Store 2    7    240
2  item 2  Some other item      Store 1    9   1234
3  item 2  Some other item      Store 2   12     98

Convert/unpivot multiple columns to rows in Python Dataframe

Use wide_to_long, but first is necessary change columns names for cor_id columns with add last digit:

df = df.rename(columns=lambda x: x + x[-1] if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.reset_index(level=1, drop=True).reset_index()

Alternative is add 0 and remove missing rows with dropna:

df = df.rename(columns=lambda x: x + '0' if x.startswith('cor_id') else x)
df = pd.wide_to_long(df, ['cor_id', 'mail'], i='id', j='i')
df['cor_id'] = df['cor_id'].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()

print (df)
    id  cor_id   mail
0    1     1.0  a@123
1    1     1.0  b@234
2    1     1.0  c@123
3    1     1.0  a@def
4    1     2.0  b@fgh
5    1     2.0  s@wer
6    1     2.0  b@ert
7    1     3.0  e@rty
8    1     3.0  c@asd
9    2     4.0  e@234
10   2     4.0  e@234
11   2     4.0  e@qwe
12   2     4.0  e@dfe
13   2     9.0  f@jfg
14   2     9.0  e@wer
15   2     9.0  g@wer
16   2    10.0  e@ert
17   2    10.0  r@ert

EDIT: If there is multiple columns like cor_id only add it to tuple for test by startswith and then change forward filling by all columns by list with ffill:

df = df.rename(columns=lambda x: x + '0' if x.startswith(('cor_id','ad')) else x)
df = pd.wide_to_long(df, ['cor_id', 'ad','mail'], i='id', j='i')
df[['cor_id','ad']] = df[['cor_id','ad']].ffill()
df = df.dropna(subset=['mail']).reset_index(level=1, drop=True).reset_index()
print (df)
    id  cor_id    ad   mail
0    1     1.0  23.0  a@123
1    1     1.0  23.0  b@234
2    1     1.0  23.0  c@123
3    1     2.0  24.0  a@def
4    1     2.0  24.0  b@fgh
5    1     2.0  24.0  c@asd
6    1     3.0  25.0  s@wer
7    1     3.0  25.0  b@ert
8    1     3.0  25.0  e@rty
9    2     4.0  33.0  e@234
10   2     4.0  33.0  e@234
11   2     4.0  33.0  e@qwe
12   2     9.0  34.0  e@dfe
13   2     9.0  34.0  f@jfg
14   2     9.0  34.0  r@ert
15   2    10.0  35.0  e@wer
16   2    10.0  35.0  g@wer
17   2    10.0  35.0  e@ert

unpivot columns into multiple columns and values in scala dataframe

I have found a solution by creating a new primary key column for joining and solution is provided below.

If anyone else have any other better approach, please share that as well.

val df = Seq(
    (1,"2022-02-01",0,5,10,15,20,25,30),
    (1,"2022-02-02",0,5,10,15,20,25,30),
    (2,"2022-02-01",0,5,10,15,20,25,30),
    (2,"2022-02-02",0,5,10,15,20,25,30)
  ).toDF("ID","DATE","TYPE","SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val df01 = df.withColumn("DATE", date_format(col("DATE"),"dd-MM-yyyy"))

val fullSig = List[String]("SIG_A","SIG_B","SIG_C","SIG_AA","SIG_BB","SIG_AAA")
val fullSigList = fullSig.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig01 = List[String]("SIG_A","SIG_B","SIG_C")
val sig01List = sig01.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig02 = List[String]("SIG_AA","SIG_BB")
val sig02List = sig02.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))

val sig03 = List[String]("SIG_AAA")
val sig03List = sig03.map(name => struct(lit(name) as "sig_name", col(name) as "sig_value"))


val unpiv01 = df01
      .select($"ID",$"DATE",$"TYPE",explode(array(sig01List:_*))as "feature")
      .select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature1_name", $"feature.sig_value" as "feature1_value")
val unpiv001 = unpiv01
      .withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature1_name")))
      .withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
      .drop("row_num")

val unpiv02 = df01
      .select($"ID",$"DATE",$"TYPE",explode(array(sig02List:_*))as "feature")
      .select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature2_name", $"feature.sig_value" as "feature2_value")
val unpiv002 = unpiv02
      .withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature2_name")))
      .withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
      .drop("row_num")

val unpiv03 = df01
      .select($"ID",$"DATE",$"TYPE",explode(array(sig03List:_*))as "feature")
      .select($"ID",$"DATE",$"TYPE",$"feature.sig_name" as "feature3_name", $"feature.sig_value" as "feature3_value")
val unpiv003= unpiv03
      .withColumn("row_num", row_number.over(Window.partitionBy("ID","DATE").orderBy("feature3_name")))
      .withColumn("joinkey", concat(col("ID"),lit("-"),col("DATE"),lit("-"),col("row_num")))
      .drop("row_num")

val joineddf = unpiv001.as("x1")
    .join(unpiv002.as("x2"),$"x1.joinkey" === $"x2.joinkey", "outer")
    .join(unpiv003.as("x3"),$"x1.joinkey" === $"x3.joinkey", "outer")
    .select($"x1.ID" as "ID",$"x1.DATE" as "DATE",$"x1.TYPE" as "TYPE",$"feature1_name",$"feature1_value",$"feature2_name",$"feature2_value",$"feature3_name",$"feature3_value")

INPUT DATAFRAME:

+---+----------+----+-----+-----+-----+------+------+-------+
| ID|      DATE|TYPE|SIG_A|SIG_B|SIG_C|SIG_AA|SIG_BB|SIG_AAA|
+---+----------+----+-----+-----+-----+------+------+-------+
|  1|01-02-2022|   0|    5|   10|   15|    20|    25|     30|
|  1|02-02-2022|   0|    5|   10|   15|    20|    25|     30|
|  2|01-02-2022|   0|    5|   10|   15|    20|    25|     30|
|  2|02-02-2022|   0|    5|   10|   15|    20|    25|     30|
+---+----------+----+-----+-----+-----+------+------+-------+

OUTPUT DATAFRAME:

+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
| ID|      DATE|TYPE|feature1_name|feature1_value|feature2_name|feature2_value|feature3_name|feature3_value|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+
|  1|01-02-2022|   0|        SIG_A|             5|       SIG_AA|            20|      SIG_AAA|            30|
|  1|01-02-2022|   0|        SIG_B|            10|       SIG_BB|            25|         null|          null|
|  1|01-02-2022|   0|        SIG_C|            15|         null|          null|         null|          null|
|  1|02-02-2022|   0|        SIG_A|             5|       SIG_AA|            20|      SIG_AAA|            30|
|  1|02-02-2022|   0|        SIG_B|            10|       SIG_BB|            25|         null|          null|
|  1|02-02-2022|   0|        SIG_C|            15|         null|          null|         null|          null|
|  2|01-02-2022|   0|        SIG_A|             5|       SIG_AA|            20|      SIG_AAA|            30|
|  2|01-02-2022|   0|        SIG_B|            10|       SIG_BB|            25|         null|          null|
|  2|01-02-2022|   0|        SIG_C|            15|         null|          null|         null|          null|
|  2|02-02-2022|   0|        SIG_A|             5|       SIG_AA|            20|      SIG_AAA|            30|
|  2|02-02-2022|   0|        SIG_B|            10|       SIG_BB|            25|         null|          null|
|  2|02-02-2022|   0|        SIG_C|            15|         null|          null|         null|          null|
+---+----------+----+-------------+--------------+-------------+--------------+-------------+--------------+

Unpivot Multiple Columns With Same Name in Pandas Dataframe