Python pandas slice dataframe by multiple index ranges
You can use numpy's r_
"slicing trick":
df = pd.DataFrame({'a':range(10,100)})
df.iloc[pd.np.r_[10:12, 25:28]]
NOTE: this now gives a warning The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead
. To do that, you can import numpy as np
and then slice the following way:
df.iloc[np.r_[10:12, 25:28]]
This gives:
a
10 20
11 21
25 35
26 36
27 37
How to select multi range of rows in pandas dataframe
Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))
You can then use np.r_ to index into ranges of rows [0:5]
, [10:15]
and [20:25]
as follows:
df.loc[np.r_[0:5, 10:15, 20:25], :]
pandas: slice a MultiIndex DataFrame by range of secondary index
d.xs(1)[0:3]
0 1
0 -0.716206 0.119265
1 -0.782315 0.097844
2 2.042751 -1.116453
Pandas dataframe slicing with multiple column ranges
Slicing by multiple label ranges is more challenging and has less support, so let's try to slice on index ranges instead:
loc = df.columns.get_loc
df.iloc[:, np.r_[loc('lat'):loc('long')+1, loc('year'):loc('day')+1]]
lat long year month day
0 0.218559 0.418508 0.345499 0.166776 0.878559
1 0.572760 0.898007 0.702427 0.386477 0.694439
2 0.803740 0.983359 0.945517 0.649540 0.860832
3 0.873401 0.906277 0.463535 0.610538 0.496282
4 0.187359 0.687674 0.039455 0.647117 0.638054
5 0.169531 0.794548 0.352917 0.484498 0.697736
6 0.022867 0.375123 0.444112 0.498140 0.414346
7 0.729086 0.415919 0.430047 0.734766 0.556216
8 0.138769 0.614932 0.109311 0.539576 0.289299
9 0.037969 0.500108 0.758036 0.262273 0.100859
When indexing by position I need to add +1
to the right index since it is right-exclusive.
Another option is to slice individual sections and concatenate:
ranges = [('lat', 'long'), ('year', 'day')]
pd.concat([df.loc[:, i:j] for i, j in ranges], axis=1)
lat long year month day
0 0.218559 0.418508 0.345499 0.166776 0.878559
1 0.572760 0.898007 0.702427 0.386477 0.694439
2 0.803740 0.983359 0.945517 0.649540 0.860832
3 0.873401 0.906277 0.463535 0.610538 0.496282
4 0.187359 0.687674 0.039455 0.647117 0.638054
5 0.169531 0.794548 0.352917 0.484498 0.697736
6 0.022867 0.375123 0.444112 0.498140 0.414346
7 0.729086 0.415919 0.430047 0.734766 0.556216
8 0.138769 0.614932 0.109311 0.539576 0.289299
9 0.037969 0.500108 0.758036 0.262273 0.100859
pandas: slice a MultiIndex by range of secondary index
As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:
In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b 2 -0.65394
4 0.08227
dtype: float64
Indeed, you can pass a slice for each level:
In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a 5 0.27919
b 2 -0.65394
4 0.08227
dtype: float64
Note: the slice is inclusive.
Old answer:
You can also do this using:
s.ix[1:10, "b"]
(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)
This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".
s["b"].iloc[1:10]
That said, I kinda disagree with the docs that ix is:
most robust and consistent way
it's not, the most consistent way is to describe what you're doing:
- use loc for labels
- use iloc for position
- use ix for both (if you really have to)
Remember the zen of python:
explicit is better than implicit
Slice pandas multiindex dataframe using list of index values
Data:
L = ['abc', 'bcd']
print (df)
text
uid tid
abc x t1
abc1 x t1
bcd y t2
1.slicers
idx = pd.IndexSlice
df1 = df.loc[idx[L,:],:]
2.boolean indexing
+ mask with get_level_values
+ isin
:
df1 = df[df.index.get_level_values(0).isin(L)]
3.query
, docs:
df1 = df.query('@L in uid')
print (df1)
text
uid tid
abc x t1
bcd y t2
How to slice multiindex dataframe with list of labels on one level
For filter by multiple values use Index.get_level_values
with Index.isin
and boolean indexing
:
a = df[df.index.get_level_values('date').isin(('2020-03-10', '2020-03-11', '2020-03-12'))]
print (a)
name price
id date
10 2020-03-10 name_10 0.557772
11 2020-03-11 name_11 0.122315
12 2020-03-12 name_12 0.775976
However Python doc says that "key" parameter could be a tuple as well:
Tuple is possible use, but working differently - you can select by both labels like:
b = df.xs((10, '2020-03-10'), drop_level=False)
print (b)
name name_10
price 0.348808
Name: (10, 2020-03-10 00:00:00), dtype: object
c = df.xs((10, '2020-03-10'), level=('id','date'), drop_level=False)
print (c)
name price
id date
10 2020-03-10 name_10 0.239876
Like @yatu mentioned, another solution with IndexSlice
is with :
for all first levels and last :
for all columns:
df = df.loc[pd.IndexSlice[:, ['2020-03-10', '2020-03-11', '2020-03-12']], :]
print (df)
name price
id date
10 2020-03-10 name_10 0.557488
11 2020-03-11 name_11 0.592082
12 2020-03-12 name_12 0.547747
Slicing multiple ranges of columns in Pandas, by list of names
I think you need numpy.r_
for concanecate positions of columns, then use iloc
for selecting:
print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
and for second approach subset by list
:
print (df[years_month])
Sample:
df = pd.DataFrame({'2000-1':[1,3,5],
'2000-2':[5,3,6],
'2000-3':[7,8,9],
'2000-4':[1,3,5],
'2000-5':[5,3,6],
'2000-6':[7,8,9],
'2000-7':[1,3,5],
'2000-8':[5,3,6],
'2000-9':[7,4,3],
'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})
print (df)
2000-1 2000-2 2000-3 2000-4 2000-5 2000-6 2000-7 2000-8 2000-9 A \
0 1 5 7 1 5 7 1 5 7 1
1 3 3 8 3 3 8 3 3 4 2
2 5 6 9 5 6 9 5 6 3 3
B C
0 4 7
1 5 8
2 6 9
print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9
You can also sum of ranges
(cast to list
in python 3
is necessary):
rng = list(range(1,3)) + list(range(6, len(df.columns)))
print (rng)
[1, 2, 6, 7, 8, 9, 10, 11]
print (df.iloc[:, rng])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9
Slice pandas dataframe using .loc with both index values and multiple column values, then set values
The reason for the IndexingError, is that you're calling df.loc
with arrays of 2 different sizes.
df.loc[rel_index]
has a length of 3 whereas df['col1'].isin(relc1)
has a length of 10.
You need the index results to also have a length of 10. If you look at the output of df['col1'].isin(relc1)
, it is an array of booleans.
You can achieve a similar array with the proper length by replacing df.loc[rel_index]
with df.index.isin([5,6,17])
so you end up with:
df.loc[df.index.isin([5,6,17]) & df['col1'].isin(relc1) & df['col2'].isin(relc2)]
which returns:
col1 col2
5 2 1
6 3 1
17 3 1
That said, I'm not sure why your index would ever look like this. Typically when slicing by index you would use df.iloc
and your index would match the 0,1,2...etc. format.
Alternatively, you could first search by value - then assign the resulting dataframe to a new variable df2
df2 = df.loc[df['col1'].isin(relc1) & df['col2'].isin(relc2)]
then df2.loc[rel_index]
would work without issue.
As for your overall goal, you can simply do the following:
c3=[1,2,3]
c4=[5,6,7]
df2=pd.DataFrame(list(zip(c3,c4)),columns=['col1','col2'],index=rel_index)
df.loc[df.index.isin([5,6,17]) & df['col1'].isin(relc1) & df['col2'].isin(relc2)] = df2
Related Topics
List of Dicts To/From Dict of Lists
Django Aggregation: Summation of Multiplication of Two Fields
Print All Day-Dates Between Two Dates
Different Ways of Clearing Lists
Error Installing Psycopg2, Library Not Found for -Lssl
How to Set Folder Permissions in Windows
What Does the Term "Broadcasting" Mean in Pandas Documentation
Asyncio.Sleep() VS Time.Sleep()
Python String 'Join' Is Faster () Than '+', But What's Wrong Here
How to Remove the Left Part of a String
Check If a File Is Open in Python
How to Declare an Array in Python
How to Use Youtube-Dl from a Python Program
Does Spark Predicate Pushdown Work with Jdbc
Typeerror: Worker() Takes 0 Positional Arguments But 1 Was Given