Select Multiple Ranges of Columns in Pandas Dataframe

Slicing multiple ranges of columns in Pandas, by list of names

I think you need numpy.r_ for concanecate positions of columns, then use iloc for selecting:

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])

and for second approach subset by list:

print (df[years_month])

Sample:

df = pd.DataFrame({'2000-1':[1,3,5],
                   '2000-2':[5,3,6],
                   '2000-3':[7,8,9],
                   '2000-4':[1,3,5],
                   '2000-5':[5,3,6],
                   '2000-6':[7,8,9],
                   '2000-7':[1,3,5],
                   '2000-8':[5,3,6],
                   '2000-9':[7,4,3],
                   'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9]})

print (df)
   2000-1  2000-2  2000-3  2000-4  2000-5  2000-6  2000-7  2000-8  2000-9  A  \
0       1       5       7       1       5       7       1       5       7  1   
1       3       3       8       3       3       8       3       3       4  2   
2       5       6       9       5       6       9       5       6       3  3   

   B  C  
0  4  7  
1  5  8  
2  6  9  

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
   2000-2  2000-3  2000-7  2000-8  2000-9  A  B  C
0       5       7       1       5       7  1  4  7
1       3       8       3       3       4  2  5  8
2       6       9       5       6       3  3  6  9

You can also sum of ranges (cast to list in python 3 is necessary):

rng = list(range(1,3)) + list(range(6, len(df.columns)))
print (rng)
[1, 2, 6, 7, 8, 9, 10, 11]

print (df.iloc[:, rng])
   2000-2  2000-3  2000-7  2000-8  2000-9  A  B  C
0       5       7       1       5       7  1  4  7
1       3       8       3       3       4  2  5  8
2       6       9       5       6       3  3  6  9

Select multiple ranges of columns in Pandas DataFrame

use np.r_

np.r_[1:10, 15, 17, 50:100]

array([ 1,  2,  3,  4,  5,  6,  7,  8,  9, 15, 17, 50, 51, 52, 53, 54, 55,
       56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66, 67, 68, 69, 70, 71, 72,
       73, 74, 75, 76, 77, 78, 79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89,
       90, 91, 92, 93, 94, 95, 96, 97, 98, 99])

so you can do

df.iloc[:, np.r_[1:10, 15, 17, 50:100]]

Selecting multiple ranges of dates from dataframe

Another way:

df['date'] = pd.to_datetime(df['date'])
df[df.date.dt.year.isin([2015, 2016]) & df.date.dt.day.lt(3)]

        date  price
0 2015-01-01     78
1 2015-01-02     87
3 2016-01-01     94
4 2016-01-02     55

Pandas dataframe slicing with multiple column ranges

Slicing by multiple label ranges is more challenging and has less support, so let's try to slice on index ranges instead:

loc = df.columns.get_loc
df.iloc[:, np.r_[loc('lat'):loc('long')+1, loc('year'):loc('day')+1]] 

        lat      long      year     month       day
0  0.218559  0.418508  0.345499  0.166776  0.878559
1  0.572760  0.898007  0.702427  0.386477  0.694439
2  0.803740  0.983359  0.945517  0.649540  0.860832
3  0.873401  0.906277  0.463535  0.610538  0.496282
4  0.187359  0.687674  0.039455  0.647117  0.638054
5  0.169531  0.794548  0.352917  0.484498  0.697736
6  0.022867  0.375123  0.444112  0.498140  0.414346
7  0.729086  0.415919  0.430047  0.734766  0.556216
8  0.138769  0.614932  0.109311  0.539576  0.289299
9  0.037969  0.500108  0.758036  0.262273  0.100859

When indexing by position I need to add +1 to the right index since it is right-exclusive.

Another option is to slice individual sections and concatenate:

ranges = [('lat', 'long'), ('year', 'day')]
pd.concat([df.loc[:, i:j] for i, j in ranges], axis=1)

        lat      long      year     month       day
0  0.218559  0.418508  0.345499  0.166776  0.878559
1  0.572760  0.898007  0.702427  0.386477  0.694439
2  0.803740  0.983359  0.945517  0.649540  0.860832
3  0.873401  0.906277  0.463535  0.610538  0.496282
4  0.187359  0.687674  0.039455  0.647117  0.638054
5  0.169531  0.794548  0.352917  0.484498  0.697736
6  0.022867  0.375123  0.444112  0.498140  0.414346
7  0.729086  0.415919  0.430047  0.734766  0.556216
8  0.138769  0.614932  0.109311  0.539576  0.289299
9  0.037969  0.500108  0.758036  0.262273  0.100859

How to efficiently select several value ranges in Pandas?

METHOD#1

You can use pd.cut and the create dynamic groups and save them in a dictionary, ,the refer each keys for the individual dataframe:

bins = [0,5,10,20,30,40,50,60,np.inf]
labels = ['five','ten','twenty','thirty','forty','fifty','sixty','over']

u = df1.assign(grp=pd.cut(df1['a'],bins,labels=labels))
d = dict(iter(u.groupby("grp")))

test runs:

print(f"""Group five is \n\n {d['five']}\n\n 
         Group forty is \n\n{d['forty']} \n\n Group over is \n\n{d['over']}""")

Group five is 

      x  a   grp
3    d  5  five
13  fc  2  five

 
Group forty is 

     x   a    grp
0    a  34  forty
10  cs  34  forty
11  ca  32  forty 

 Group forty is 

     x     a   grp
4    e   120  over
8   cf    67  over
12  ac  1213  over

METHOD#2
you can also use locals for saving dictionary keys a local variables but the dict method is better:

bins = [0,5,10,20,30,40,50,60,np.inf]
labels = ['five','ten','twenty','thirty','forty','fifty','sixty','over']

u = df1.assign(grp=pd.cut(df1['a'],bins,labels=labels))
d = dict(iter(u.groupby("grp")))
for k,v in d.items():
    locals().update({k:v})

print(over,'\n\n',five,'\n\n',sixty)

     x     a   grp
4    e   120  over
8   cf    67  over
12  ac  1213  over 

      x  a   grp
3    d  5  five
13  fc  2  five 

     x   a    grp
2   c  51  sixty
7  cd  56  sixty
9  cv  54  sixty

Select multiple columns by labels in pandas

Name- or Label-Based (using regular expression syntax)

df.filter(regex='[A-CEG-I]')   # does NOT depend on the column order

Note that any regular expression is allowed here, so this approach can be very general. E.g. if you wanted all columns starting with a capital or lowercase "A" you could use: df.filter(regex='^[Aa]')

Location-Based (depends on column order)

df[ list(df.loc[:,'A':'C']) + ['E'] + list(df.loc[:,'G':'I']) ]

Note that unlike the label-based method, this only works if your columns are alphabetically sorted. This is not necessarily a problem, however. For example, if your columns go ['A','C','B'], then you could replace 'A':'C' above with 'A':'B'.

The Long Way

And for completeness, you always have the option shown by @Magdalena of simply listing each column individually, although it could be much more verbose as the number of columns increases:

df[['A','B','C','E','G','H','I']]   # does NOT depend on the column order

Results for any of the above methods

          A         B         C         E         G         H         I
0 -0.814688 -1.060864 -0.008088  2.697203 -0.763874  1.793213 -0.019520
1  0.549824  0.269340  0.405570 -0.406695 -0.536304 -1.231051  0.058018
2  0.879230 -0.666814  1.305835  0.167621 -1.100355  0.391133  0.317467

Pandas: Find values within multiple ranges defined by start- and stop-columns

I believe need parameter closed='both' in IntervalIndex.from_arrays:

intervals = pd.IntervalIndex.from_arrays(df2['start'], df2['stop'], 'both')

And then select matching values:

df = df[intervals.get_indexer(df.age.values) != -1]
print (df)
   age  some_random_value
0    1                100
1    2                200
2    3                300
4    5                500
5    6                600
6    7                700

Detail:

print (intervals.get_indexer(df.age.values))
[ 0  0  0 -1  1  1  1 -1 -1 -1]