How can I select rows by range?
For mysql you have limit, you can fire query as :
SELECT * FROM table limit 100` -- get 1st 100 records
SELECT * FROM table limit 100, 200` -- get 200 records beginning with row 101
For Oracle you can use rownum
See mysql select syntax and usage for limit
here.
For SQLite, you have limit, offset
. I haven't used SQLite but I checked it on SQLite Documentation. Check example for SQLite here.
How to select rows from a data frame based on the values in a table of ranges
Try:
Firstly replace ' '
to NaN
via replace()
method:
df1=df1.replace(r'\s+',float('NaN'),regex=True)
#^ it will replace one or more occurence of ' '
The idea is to make the string ranges to actual list of combined range values:
s=df1.set_index('name').stack().dropna()
s=s.str.split(',').map(lambda x:range(int(x[0]),int(x[1])+1)).explode().unique()
Finally:
out=df2[df2['colour'].isin(s)]
#OR
out=df2.loc[df2['colour'].isin(s)]
output of out
:
Supplier colour
0 Abi 1
1 John 678
3 Tim 1570
To select any range of records from SQL Server database
In SQL-Server 2012 and above you could use OFFSET
and FETCH
in following:
SELECT *
FROM tbl
ORDER BY name
OFFSET 20 ROWS
FETCH NEXT 25 ROWS ONLY
In older versions you have to use ROW_NUMBER()
in following:
SELECT *
FROM (
SELECT *, ROW_NUMBER() OVER (ORDER BY name) as rn
FROM tbl
) x
WHERE rn > 20 and rn <= 45
select a range of specific rows with pandas
Try something like:
df.iloc[[*range(1, 5), *range(10, 13)]]
Select rows where column values are between a given range
Option 1pd.Series.between
seems suited for this task.
df[~df['values'].between('2017-03-02', '2017-03-05', inclusive=False)]
values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06
Detailsbetween
identifies all items within the range -
m = df['values'].between('2017-03-02', '2017-03-05', inclusive=False)
m
2018-01-01 False
2018-01-02 False
2018-01-03 True
2018-01-04 True
2018-01-05 False
2018-01-06 False
Freq: D, Name: values, dtype: bool
Use the mask to filter on df
-
df = df[~m]
Option 2
Alternatively, with the good ol' old logical OR -
df[~(df['values'].gt('2017-03-02') & df['values'].lt('2017-03-05'))]
values
2018-01-01 2017-03-01
2018-01-02 2017-03-02
2018-01-05 2017-03-05
2018-01-06 2017-03-06
Note that both options work with datetime objects as well as string date columns (in which case, the comparison is lexicographic).
How to select multi range of rows in pandas dataframe
Say you have a random pandas DataFrame with 30 rows and 4 columns as follows:
import numpy as np
import pandas as pd
df = pd.DataFrame(np.random.randint(0,30,size=(30, 4)), columns=list('ABCD'))
You can then use np.r_ to index into ranges of rows [0:5]
, [10:15]
and [20:25]
as follows:
df.loc[np.r_[0:5, 10:15, 20:25], :]
Related Topics
Differencebetween Postgres Distinct VS Distinct On
Besides a Declarative Language, Is SQL a Functional Language
SQL Server 2008 Unique Column That Is Case Sensitive
Postgres: How to Do Composite Keys
How to Take Last Four Characters from a Varchar
Trying to Flatten Rows into Columns
How to Select Top X But Still Get a Count of the Whole Query
SQL Run from Excel Cannot Use a Temporary Table
What Is the Easiest Way to Update an Image Field with the Content of a File
SQL One to One Relationship VS. Single Table
How to Iterate Over a Date Range in Pl/Sql
How to Find Top Three Highest Salary in Emp Table in Oracle
Temporary Table in SQL Server Causing ' There Is Already an Object Named' Error
Just Get Column Names from Hive Table
Sqlite: Count Slow on Big Tables