How to Stack Only Some Columns in a Data Frame

Pandas, stack some columns, unstack some others

Perhaps you're looking for stack instead of melt:

(df.set_index(['GEOG_CODE', 'COUNTRY', 'Group1'])
.stack()
.unstack(-2)
.ffill(axis=1)
.bfill(axis=1, downcast='infer')
.add_prefix('low_')
.reset_index()
.rename({'level_2': 'year'}, axis=1))

Group1 GEOG_CODE COUNTRY year low_A low_B
0 123 England 2016 0 0
1 123 England 2017 0 0
2 123 England 2018 0 0
3 234 England 2016 1 1
4 234 England 2017 1 1
5 234 England 2018 1 1
6 567 England 2016 2 2
7 567 England 2017 2 2
8 567 England 2018 2 2
9 901 England 2016 3 3
10 901 England 2017 3 3
11 901 England 2018 3 3

How do I stack only some columns in a data frame?

Using reshape2 package

dat <- read.table(text = 'A  B  C  V1  V2  V3
1 1 1 x y z
1 1 2 a b c',header= T)
library(reshape2)
melt(dat,id.vars = c('A','B','C'))
A B C variable value
1 1 1 1 V1 x
2 1 1 2 V1 a
3 1 1 1 V2 y
4 1 1 2 V2 b
5 1 1 1 V3 z
6 1 1 2 V3 c

How to stack only selected columns in pandas barh plot

Use position parameter to draw 2 columns on the same index:

fig, ax = plt.subplots()
d[['DC', 'DC - IDC']].plot.barh(width=0.4, position=0, stacked=True, ax=ax)
d[['No Address']].plot.barh(width=0.4, position=1, stacked=True, ax=ax, color='red')
plt.show()

barh

Stack some columns of dataframe

You are looking to pd.wide_to_long:

out = pd.wide_to_long(df.rename(columns={'price': 'price_mod'}),
stubnames=['package', 'price'],
i=['product_id', 'product_name'],
j='dummy', sep='_', suffix=r'\w+') \
.droplevel(level='dummy').reset_index()

Output:

>>> out
product_id product_name package price
0 2017210 potato 1KG 3.3900
1 2017210 potato 50g 0.1695
2 2017210 potato 100g 0.3390
3 2017081 onion 1KG 3.5900
4 2017081 onion 50g 0.1795
5 2017081 onion 100g 0.3590
6 2017217 lemon 1KG 2.7900
7 2017217 lemon 50g 0.1395
8 2017217 lemon 100g 0.2790

How to stack only some of pandas DataFrame series on bar chart

Try to plot twice with option stacked and position:

fig, ax = plt.subplots(figsize=(10,6))

# plot first 3 columns, stacked and shifted left
plotdata.iloc[:,:-1].plot.bar(stacked=True, position=1, width=.4, ax=ax)

# plot last column, shifted right
plotdata.iloc[:,-1:].plot.bar(position=0,color='C4', width=.4, ax=ax)

ax.set_xlim(-0.5, len(plotdata)-0.5)

Output:

Sample Image

Unstack or Pivot Only Some Columns

You can set multiindex by first 3 columns and use unstack with level=0.

Then you can rename columns names - or by levels and labels or by custom names.

#  COL1  COL2  COL3   COL4  year
#0 A 1 2 11032 2016
#1 A 2 2 1960 2017
#2 A 3 4 11400 2018
#3 B 1 2 11355 2019
#4 B 2 2 8 2020
#5 B 3 4 7 2021

DF = DF.set_index(['COL1', 'COL2', 'COL3'])
DF = DF.unstack(0).reset_index()
print DF
# COL2 COL3 COL4 year
#COL1 A B A B
#0 1 2 11032 11355 2016 2019
#1 2 2 1960 8 2017 2020
#2 3 4 11400 7 2018 2021

levels = DF.columns.levels
labels = DF.columns.labels
DF.columns = levels[0][labels[0]]
print DF
# COL2 COL3 COL4 COL4 year year
#0 1 2 11032 11355 2016 2019
#1 2 2 1960 8 2017 2020
#2 3 4 11400 7 2018 2021
DF.columns = ['COL2','COL3','COL4','COL5','COL6','COL7']
print DF
# COL2 COL3 COL4 COL5 COL6 COL7
#0 1 2 11032 11355 2016 2019
#1 2 2 1960 8 2017 2020
#2 3 4 11400 7 2018 2021

But if columns COL2 and COL3 have different order, you can use little hack by Andy Hayden:

#  COL1  COL2  COL3   COL4  year
#0 A 4 6 11032 2016
#1 A 9 2 1960 2017
#2 A 8 4 11400 2018
#3 B 4 6 11355 2019
#4 B 9 2 8 2020
#5 B 8 4 7 2021
DF = DF.set_index(['COL1', 'COL2', 'COL3'])

index = pd.MultiIndex(levels=[DF.index.get_level_values(1).unique(),
DF.index.get_level_values(2).unique()],
labels=[[0, 1, 2],
[0, 1, 2]])
DF = DF.unstack(0)
DF = DF.reindex(index).reset_index()
print DF
# level_0 level_1 COL4 year
#COL1 A B A B
#0 4 6 11032 11355 2016 2019
#1 9 2 1960 8 2017 2020
#2 8 4 11400 7 2018 2021
levels = DF.columns.levels
labels = DF.columns.labels
DF.columns = levels[0][labels[0]]
print DF
# level_0 level_1 COL4 COL4 year year
#0 4 6 11032 11355 2016 2019
#1 9 2 1960 8 2017 2020
#2 8 4 11400 7 2018 2021
DF.columns = ['COL2','COL3','COL4','COL5','COL6','COL7']
print DF
# COL2 COL3 COL4 COL5 COL6 COL7
#0 4 6 11032 11355 2016 2019
#1 9 2 1960 8 2017 2020
#2 8 4 11400 7 2018 2021

How to stack two columns of a pandas dataframe in python

You can create a list of the cols, and call squeeze to anonymise the data so it doesn't try to align on columns, and then call concat on this list, passing ignore_index=True creates a new index, otherwise you'll get the names as index values repeated:

cols = [df[col].squeeze() for col in df]
pd.concat(cols, ignore_index=True)

Pandas dataframe fillna() only some columns in place

You can select your desired columns and do it by assignment:

df[['a', 'b']] = df[['a','b']].fillna(value=0)

The resulting output is as expected:

     a    b    c
0 1.0 4.0 NaN
1 2.0 5.0 NaN
2 3.0 0.0 7.0
3 0.0 6.0 8.0

Stacking a number of columns into one column in python

You're looking for pd.wide_to_long. This should do:

pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

The column destination_ will have the info you're looking for.

Example:

import pandas as pd
d = {'source': ['a', 'b'],
'destination_1': ['contact_1', 'contact_6'],
'destination_2': ['contact_2', 'contact_7']}
df = pd.DataFrame(d)
pd.wide_to_long(df, stubnames='destination_', i=['source'], j='number')

Output:

              destination_
source number
a 1 contact_1
b 1 contact_6
a 2 contact_2
b 2 contact_7

Selecting/excluding sets of columns in pandas

You can either Drop the columns you do not need OR Select the ones you need

# Using DataFrame.drop
df.drop(df.columns[[1, 2]], axis=1, inplace=True)

# drop by Name
df1 = df1.drop(['B', 'C'], axis=1)

# Select the ones you want
df1 = df[['a','d']]


Related Topics



Leave a reply



Submit