Python Pandas Slice Dataframe by Multiple Index Ranges

Python pandas slice dataframe by multiple index ranges

You can use numpy's r_ "slicing trick":

df = pd.DataFrame({'a':range(10,100)})
df.iloc[pd.np.r_[10:12, 25:28]]

NOTE: this now gives a warning The pandas.np module is deprecated and will be removed from pandas in a future version. Import numpy directly instead. To do that, you can import numpy as np and then slice the following way:

df.iloc[np.r_[10:12, 25:28]]

This gives:

     a
10 20
11 21
25 35
26 36
27 37

How to select multi range of rows in pandas dataframe

Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))

You can then use np.r_ to index into ranges of rows [0:5], [10:15] and [20:25] as follows:

df.loc[np.r_[0:5, 10:15, 20:25], :]

pandas: slice a MultiIndex DataFrame by range of secondary index

d.xs(1)[0:3]

0 1
0 -0.716206 0.119265
1 -0.782315 0.097844
2 2.042751 -1.116453

Pandas dataframe slicing with multiple column ranges

Slicing by multiple label ranges is more challenging and has less support, so let's try to slice on index ranges instead:

loc = df.columns.get_loc
df.iloc[:, np.r_[loc('lat'):loc('long')+1, loc('year'):loc('day')+1]]

lat long year month day
0 0.218559 0.418508 0.345499 0.166776 0.878559
1 0.572760 0.898007 0.702427 0.386477 0.694439
2 0.803740 0.983359 0.945517 0.649540 0.860832
3 0.873401 0.906277 0.463535 0.610538 0.496282
4 0.187359 0.687674 0.039455 0.647117 0.638054
5 0.169531 0.794548 0.352917 0.484498 0.697736
6 0.022867 0.375123 0.444112 0.498140 0.414346
7 0.729086 0.415919 0.430047 0.734766 0.556216
8 0.138769 0.614932 0.109311 0.539576 0.289299
9 0.037969 0.500108 0.758036 0.262273 0.100859

When indexing by position I need to add +1 to the right index since it is right-exclusive.


Another option is to slice individual sections and concatenate:

ranges = [('lat', 'long'), ('year', 'day')]
pd.concat([df.loc[:, i:j] for i, j in ranges], axis=1)

lat long year month day
0 0.218559 0.418508 0.345499 0.166776 0.878559
1 0.572760 0.898007 0.702427 0.386477 0.694439
2 0.803740 0.983359 0.945517 0.649540 0.860832
3 0.873401 0.906277 0.463535 0.610538 0.496282
4 0.187359 0.687674 0.039455 0.647117 0.638054
5 0.169531 0.794548 0.352917 0.484498 0.697736
6 0.022867 0.375123 0.444112 0.498140 0.414346
7 0.729086 0.415919 0.430047 0.734766 0.556216
8 0.138769 0.614932 0.109311 0.539576 0.289299
9 0.037969 0.500108 0.758036 0.262273 0.100859

pandas: slice a MultiIndex by range of secondary index

As Robbie-Clarken answers, since 0.14 you can pass a slice in the tuple you pass to loc:

In [11]: s.loc[('b', slice(2, 10))]
Out[11]:
b 2 -0.65394
4 0.08227
dtype: float64

Indeed, you can pass a slice for each level:

In [12]: s.loc[(slice('a', 'b'), slice(2, 10))]
Out[12]:
a 5 0.27919
b 2 -0.65394
4 0.08227
dtype: float64

Note: the slice is inclusive.


Old answer:

You can also do this using:

s.ix[1:10, "b"]

(It's good practice to do in a single ix/loc/iloc since this version allows assignment.)

This answer was written prior to the introduction of iloc in early 2013, i.e. position/integer location - which may be preferred in this case. The reason it was created was to remove the ambiguity from integer-indexed pandas objects, and be more descriptive: "I'm slicing on position".

s["b"].iloc[1:10]

That said, I kinda disagree with the docs that ix is:

most robust and consistent way

it's not, the most consistent way is to describe what you're doing:

  • use loc for labels
  • use iloc for position
  • use ix for both (if you really have to)

Remember the zen of python:

explicit is better than implicit

Slice pandas multiindex dataframe using list of index values

Data:

L = ['abc', 'bcd']

print (df)
text
uid tid
abc x t1
abc1 x t1
bcd y t2

1.slicers

idx = pd.IndexSlice
df1 = df.loc[idx[L,:],:]

2.boolean indexing + mask with get_level_values + isin:

df1 = df[df.index.get_level_values(0).isin(L)]

3.query, docs:

df1 = df.query('@L in uid')
print (df1)
text
uid tid
abc x t1
bcd y t2

How to slice multiindex dataframe with list of labels on one level

For filter by multiple values use Index.get_level_values with Index.isin and boolean indexing:

a = df[df.index.get_level_values('date').isin(('2020-03-10', '2020-03-11', '2020-03-12'))]
print (a)
name price
id date
10 2020-03-10 name_10 0.557772
11 2020-03-11 name_11 0.122315
12 2020-03-12 name_12 0.775976

However Python doc says that "key" parameter could be a tuple as well:

Tuple is possible use, but working differently - you can select by both labels like:

b = df.xs((10, '2020-03-10'), drop_level=False)
print (b)
name name_10
price 0.348808
Name: (10, 2020-03-10 00:00:00), dtype: object

c = df.xs((10, '2020-03-10'), level=('id','date'), drop_level=False)
print (c)
name price
id date
10 2020-03-10 name_10 0.239876

Like @yatu mentioned, another solution with IndexSlice is with : for all first levels and last : for all columns:

df = df.loc[pd.IndexSlice[:, ['2020-03-10', '2020-03-11', '2020-03-12']], :]
print (df)
name price
id date
10 2020-03-10 name_10 0.557488
11 2020-03-11 name_11 0.592082
12 2020-03-12 name_12 0.547747

Slicing multiple ranges of columns in Pandas, by list of names

I think you need numpy.r_ for concanecate positions of columns, then use iloc for selecting:

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])

and for second approach subset by list:

print (df[years_month])

Sample:

df = pd.DataFrame({'2000-1':[1,3,5],
'2000-2':[5,3,6],
'2000-3':[7,8,9],
'2000-4':[1,3,5],
'2000-5':[5,3,6],
'2000-6':[7,8,9],
'2000-7':[1,3,5],
'2000-8':[5,3,6],
'2000-9':[7,4,3],
'A':[1,2,3],
'B':[4,5,6],
'C':[7,8,9]})

print (df)
2000-1 2000-2 2000-3 2000-4 2000-5 2000-6 2000-7 2000-8 2000-9 A \
0 1 5 7 1 5 7 1 5 7 1
1 3 3 8 3 3 8 3 3 4 2
2 5 6 9 5 6 9 5 6 3 3

B C
0 4 7
1 5 8
2 6 9

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9

You can also sum of ranges (cast to list in python 3 is necessary):

rng = list(range(1,3)) + list(range(6, len(df.columns)))
print (rng)
[1, 2, 6, 7, 8, 9, 10, 11]

print (df.iloc[:, rng])
2000-2 2000-3 2000-7 2000-8 2000-9 A B C
0 5 7 1 5 7 1 4 7
1 3 8 3 3 4 2 5 8
2 6 9 5 6 3 3 6 9

Slice pandas dataframe using .loc with both index values and multiple column values, then set values

The reason for the IndexingError, is that you're calling df.loc with arrays of 2 different sizes.

df.loc[rel_index] has a length of 3 whereas df['col1'].isin(relc1) has a length of 10.

You need the index results to also have a length of 10. If you look at the output of df['col1'].isin(relc1), it is an array of booleans.

You can achieve a similar array with the proper length by replacing df.loc[rel_index] with df.index.isin([5,6,17])

so you end up with:

df.loc[df.index.isin([5,6,17]) & df['col1'].isin(relc1) & df['col2'].isin(relc2)]

which returns:

    col1  col2
5 2 1
6 3 1
17 3 1

That said, I'm not sure why your index would ever look like this. Typically when slicing by index you would use df.iloc and your index would match the 0,1,2...etc. format.

Alternatively, you could first search by value - then assign the resulting dataframe to a new variable df2

df2 = df.loc[df['col1'].isin(relc1) & df['col2'].isin(relc2)]

then df2.loc[rel_index] would work without issue.

As for your overall goal, you can simply do the following:

c3=[1,2,3]
c4=[5,6,7]
df2=pd.DataFrame(list(zip(c3,c4)),columns=['col1','col2'],index=rel_index)

df.loc[df.index.isin([5,6,17]) & df['col1'].isin(relc1) & df['col2'].isin(relc2)] = df2


Related Topics



Leave a reply



Submit