Pandas: Multiple columns into one column
Update
pandas has a built in method for this stack
which does what you want see the other answer.
This was my first answer before I knew about stack
many years ago:
In [227]:
df = pd.DataFrame({'Column 1':['A', 'B', 'C', 'D'],'Column 2':['E', 'F', 'G', 'H']})
df
Out[227]:
Column 1 Column 2
0 A E
1 B F
2 C G
3 D H
[4 rows x 2 columns]
In [228]:
df['Column 1'].append(df['Column 2']).reset_index(drop=True)
Out[228]:
0 A
1 B
2 C
3 D
4 E
5 F
6 G
7 H
dtype: object
How to stack/append all columns into one column in Pandas?
Very simply with melt
:
import pandas as pd
df.melt().drop('variable',axis=1).rename({'value':'A'},axis=1)
A
0 1
1 2
2 3
3 4
4 5
5 6
6 7
7 8
8 9
How to convert multiple columns in one column in pandas?
Use melt
:
>>> df.melt(var_name='route', value_name='edge')
route edge
0 route1 19.0
1 route1 47.0
2 route1 56.0
3 route1 43.0
4 route2 51.0
5 route2 46.0
6 route2 37.0
7 route2 2.0
If you have some columns to protect, use id_vars=['col1', 'col2', ...]
to not flatten them.
Merge multiple column values into one column in python pandas
You can call apply
pass axis=1
to apply
row-wise, then convert the dtype to str
and join
:
In [153]:
df['ColumnA'] = df[df.columns[1:]].apply(
lambda x: ','.join(x.dropna().astype(str)),
axis=1
)
df
Out[153]:
Column1 Column2 Column3 Column4 Column5 ColumnA
0 a 1 2 3 4 1,2,3,4
1 a 3 4 5 NaN 3,4,5
2 b 6 7 8 NaN 6,7,8
3 c 7 7 NaN NaN 7,7
Here I call dropna
to get rid of the NaN
, however we need to cast again to int
so we don't end up with floats as str.
Transpose multiple columns into one column using Python
This can be accomplished with melt
df.melt(id_vars = ['Date'], value_vars = df.columns.drop('Date').tolist())
Append multiple columns to single column
Try:
single_column_frame = pd.concat([df[col] for col in df.columns])
If you want to create a single column and get rid of month names:
df_new = df.melt()['value'].to_frame()
Or you can do:
single_column_frame = single_column_frame.reset_index().drop(columns=['index'])
You can also do:
single_column_frame = df.stack().reset_index().loc[:,0]
Pandas: sum up multiple columns into one column without last column
You can first select by iloc
and then sum
:
df['Fruit Total']= df.iloc[:, -4:-1].sum(axis=1)
print (df)
Apples Bananas Grapes Kiwis Fruit Total
0 2.0 3.0 NaN 1.0 5.0
1 1.0 3.0 7.0 NaN 11.0
2 NaN NaN 2.0 3.0 2.0
For sum all columns use:
df['Fruit Total']= df.sum(axis=1)
how to re-arrange multiple columns into one column with same index
This looks a little like the melt
function in pandas
, with the only difference being the index.
https://pandas.pydata.org/pandas-docs/stable/generated/pandas.melt.html
Here is some code you can run to test:
import pandas as pd
df = pd.DataFrame({'A': {0: 'a', 1: 'b', 2: 'c'},'B': {0: 1, 1: 3, 2: 5},'C': {0: 2, 1: 4, 2: 6}})
pd.melt(df)
With a little manipulation, you could solve for the indexing issue.
This is not particularly pythonic, but if you have a limited number of columns, you could make due with:
molten = pd.melt(df)
a = molten.merge(df, left_on='value', right_on = 'A')
b = molten.merge(df, left_on='value', right_on = 'B')
c = molten.merge(df, left_on='value', right_on = 'C')
merge = pd.concat([a,b,c])
Combine different values of multiple columns into one column
Thanks to the comments (@ polars issues) from @cannero and @ritchie46, I was able to make it work.
This is a working version (Float64):
use polars::prelude::*;
fn my_black_box_function(a: f64, b: f64) -> f64 {
// do something
a
}
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
let ergebnis = lf
.select([col("struct_col").map(
|s| {
let ca = s.struct_()?;
let b = ca.field_by_name("a")?;
let a = ca.field_by_name("b")?;
let a = a.f64()?;
let b = b.f64()?;
let out: Float64Chunked = a
.into_iter()
.zip(b.into_iter())
.map(|(opt_a, opt_b)| match (opt_a, opt_b) {
(Some(a), Some(b)) => Some(my_black_box_function(a, b)),
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Float64),
)])
.collect();
ergebnis
}
fn main() {
// We start with a normal DataFrame
let df = df![
"a" => [1.0, 2.0, 3.0],
"b" => [3.0, 5.1, 0.3]
]
.unwrap();
// We CONVERT the df into a StructChunked and WRAP this into a new LazyFrame
let lf = df![
"struct_col" => df.into_struct("StructChunked")
]
.unwrap()
.lazy();
let processed = apply_multiples(lf);
match processed {
Ok(..) => println!("We did it"),
Err(e) => println!("{:?}", e),
}
}
Here is a version for my initial question (String):
use polars::prelude::*;
fn my_fruit_box(fruit: String, color: String) -> String {
// do something
format!("{} has {} color", fruit, color)
}
fn apply_multiples(lf: LazyFrame) -> Result<DataFrame> {
let ergebnis = lf
.select([col("struct_col").map(
|s| {
let ca = s.struct_()?;
let fruit = ca.field_by_name("Fruit")?;
let color = ca.field_by_name("Color")?;
let color = color.utf8()?;
let fruit = fruit.utf8()?;
let out: Utf8Chunked = fruit
.into_iter()
.zip(color.into_iter())
.map(|(opt_fruit, opt_color)| match (opt_fruit, opt_color) {
(Some(fruit), Some(color)) => {
Some(my_fruit_box(fruit.to_string(), color.to_string()))
}
_ => None,
})
.collect();
Ok(out.into_series())
},
GetOutput::from_type(DataType::Utf8),
)])
.collect();
ergebnis
}
fn main() {
// We start with a normal DataFrame
let s1 = Series::new("Fruit", &["Apple", "Apple", "Pear"]);
let s2 = Series::new("Color", &["Red", "Yellow", "Green"]);
let df = DataFrame::new(vec![s1, s2]).unwrap();
// We CONVERT the df into a StructChunked and WRAP this into a new LazyFrame
let lf = df![
"struct_col" => df.into_struct("StructChunked")
]
.unwrap()
.lazy();
let processed = apply_multiples(lf);
match processed {
Ok(..) => println!("We did it"),
Err(e) => println!("{:?}", e),
}
}
Related Topics
Changing Iteration Variable Inside for Loop in Python
I Expect 'True' But Get 'None'
How to Understand Numpy Strides for Layman
How to Tell If Numpy Creates a View or a Copy
Windows- Pyinstaller Error "Failed to Execute Script " When App Clicked
What's the Best Practice Using a Settings File in Python
Pyplot Common Axes Labels for Subplots
How to Upgrade to Python 3.6 with Conda
How to Get the Current Time in Milliseconds in Python
Python Requests - How to Use System Ca-Certificates (Debian/Ubuntu)
Multiple Linear Regression in Python
Full Examples of Using Pyserial Package
Display Fullscreen Mode on Tkinter
Where Is Python's "Best Ascii for This Unicode" Database
Split a List into Parts Based on a Set of Indexes in Python
How Can One Continuously Generate and Track Several Random Objects with a Time Delay in Pygame