How to Select Rows by Range

How can I select rows by range?

For mysql you have limit, you can fire query as :

SELECT * FROM table limit 100` -- get 1st 100 records
SELECT * FROM table limit 100, 200` -- get 200 records beginning with row 101

For Oracle you can use rownum

See mysql select syntax and usage for limit here.

For SQLite, you have limit, offset. I haven't used SQLite but I checked it on SQLite Documentation. Check example for SQLite here.

How to select rows from a data frame based on the values in a table of ranges

Try:

Firstly replace ' ' to NaN via replace() method:

df1=df1.replace(r'\s+',float('NaN'),regex=True)
#^ it will replace one or more occurence of ' '

The idea is to make the string ranges to actual list of combined range values:

s=df1.set_index('name').stack().dropna() 
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()

Finally:

out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]

output of out:

    Supplier    colour
0 Abi 1
1 John 678
3 Tim 1570

To select any range of records from SQL Server database

In SQL-Server 2012 and above you could use OFFSET and FETCH in following:

SELECT *
FROM tbl
ORDER BY name
OFFSET 20 ROWS
FETCH NEXT 25 ROWS ONLY

In older versions you have to use ROW_NUMBER() in following:

SELECT * 
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY name) as rn
FROM tbl
) x
WHERE rn > 20 and rn <= 45

select a range of specific rows with pandas

Try something like:

df.iloc[[*range(1, 5), *range(10, 13)]]

Select rows where column values are between a given range

Option 1
pd.Series.between seems suited for this task.

df[~df['values'].between('2017-03-02', '2017-03-05', inclusive=False)]

values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06

Details
between identifies all items within the range -

m = df['values'].between('2017-03-02', '2017-03-05', inclusive=False)
m

2018-01-01 False
2018-01-02 False
2018-01-03 True
2018-01-04 True
2018-01-05 False
2018-01-06 False
Freq: D, Name: values, dtype: bool

Use the mask to filter on df -

df = df[~m]

Option 2
Alternatively, with the good ol' old logical OR -

df[~(df['values'].gt('2017-03-02') & df['values'].lt('2017-03-05'))]

values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06

Note that both options work with datetime objects as well as string date columns (in which case, the comparison is lexicographic).

How to select multi range of rows in pandas dataframe

Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:

import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))

You can then use np.r_ to index into ranges of rows [0:5], [10:15] and [20:25] as follows:

df.loc[np.r_[0:5, 10:15, 20:25], :]


Related Topics



Leave a reply



Submit