Selecting multiple columns in a Pandas dataframe
The column names (which are strings) cannot be sliced in the manner you tried.
Here you have a couple of options. If you know from context which variables you want to slice out, you can just return a view of only those columns by passing a list into the __getitem__
syntax (the []'s).
df1 = df[['a', 'b']]
Alternatively, if it matters to index them numerically and not by their name (say your code should automatically do this without knowing the names of the first two columns) then you can do this instead:
df1 = df.iloc[:, 0:2] # Remember that Python does not slice inclusive of the ending index.
Additionally, you should familiarize yourself with the idea of a view into a Pandas object vs. a copy of that object. The first of the above methods will return a new copy in memory of the desired sub-object (the desired slices).
Sometimes, however, there are indexing conventions in Pandas that don't do this and instead give you a new variable that just refers to the same chunk of memory as the sub-object or slice in the original object. This will happen with the second way of indexing, so you can modify it with the .copy()
method to get a regular copy. When this happens, changing what you think is the sliced object can sometimes alter the original object. Always good to be on the look out for this.
df1 = df.iloc[0, 0:2].copy() # To avoid the case where changing df1 also changes df
To use iloc
, you need to know the column positions (or indices). As the column positions may change, instead of hard-coding indices, you can use iloc
along with get_loc
function of columns
method of dataframe object to obtain column indices.
{df.columns.get_loc(c): c for idx, c in enumerate(df.columns)}
Now you can use this dictionary to access columns through names and using iloc
.
Select Multiple Columns in DataFrame Pandas. Slice + Select
You can do this in a couple of different ways:
Using the same format you are currently trying to use, I think doing a join of col54 will be necessary.
df = df.loc[:,'col2':'col4'].join(df.loc[:,'col54'])
.
Another method given that col2 is close to col4 would be to do this
df = df.loc[:,['col2','col3','col4', 'col54']]
or simply
df = df[['col2','col3','col4','col54']]
Selecting multiple columns, both consecutive and non-consecutive, in a Pandas dataframe
Use np.r_
:
import numpy as np
X = d.iloc[:, np.r_[13, 30, 35:45]].to_numpy()
Intermediate output of np.r_[13, 30, 35:45]
:
array([13, 30, 35, 36, 37, 38, 39, 40, 41, 42, 43, 44])
How to select multiple columns in a data frame that has many columns (>100)?
df.iloc[:, lambda df: [*range(2, 50)] + [*range(59, 84)] + [*range(95, 110)]
Multiple selection of dataframe using multiple column slices
You can use numpy:
#import numpy as np
df.iloc[:,np.r_[0:4, 5:9]]
np.r_
will concatenate the indexes for you.
Select multiple columns by labels in pandas
Name- or Label-Based (using regular expression syntax)
df.filter(regex='[A-CEG-I]') # does NOT depend on the column order
Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')
Location-Based (depends on column order)
df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]
Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B']
, then you could replace 'A':'C'
above with 'A':'B'
.
The Long Way
And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:
df[['A','B','C','E','G','H','I']] # does NOT depend on the column order
Results for any of the above methods
A B C E G H I
0 -0.814688 -1.060864 -0.008088 2.697203 -0.763874 1.793213 -0.019520
1 0.549824 0.269340 0.405570 -0.406695 -0.536304 -1.231051 0.058018
2 0.879230 -0.666814 1.305835 0.167621 -1.100355 0.391133 0.317467
Related Topics
Best Way to Convert String to Bytes in Python 3
How to Get Line Count of a Large File Cheaply in Python
How to Capture Sigint in Python
Using @Property Versus Getters and Setters
Bare Asterisk in Function Arguments
How to Play Wav File in Python
Python Requests Throwing Sslerror
How to Check If a Variable Exists
Iterating Through a Range of Dates in Python
Remove Specific Characters from a String in Python
Convert All Strings in a List to Int
Using Module 'Subprocess' With Timeout
Saving Utf-8 Texts With Json.Dumps as Utf8, Not as \U Escape Sequence
Annotate Bars With Values on Pandas Bar Plots
Matplotlib/Seaborn: First and Last Row Cut in Half of Heatmap Plot
How to Find the Location of My Python Site-Packages Directory