Python Pandas: drop rows of a timeserie based on time range
using query
df.query('index < @start_remove or index > @end_remove')
using loc
df.loc[(df.index < start_remove) | (df.index > end_remove)]
using date slicing
This includes the end points
pd.concat([df[:start_remove], df[end_remove:]])
And without the end points
pd.concat([df[:start_remove], df[end_remove:]]).drop([start_remove, end_remove])
Pandas Drop Rows Outside of Time Range
You can use the between_time
function directly:
ts.between_time(datetime.time(18), datetime.time(9), include_start=False, include_end=False)
Original answer:
You can use the indexer_between_time
Index
method.
For example, to include those times between 9am and 6pm (inclusive):
ts.ix[ts.index.indexer_between_time(datetime.time(9), datetime.time(18))]
to do the opposite and exclude those times between 6pm and 9am (exclusive):
ts.ix[ts.index.indexer_between_time(datetime.time(18), datetime.time(9),
include_start=False, include_end=False)]
Note: indexer_between_time
's arguments include_start
and include_end
are by default True
, setting include_start
to False
means that datetimes whose time-part is precisely start_time
(the first argument), in this case 6pm, will not be included.
Example:
In [1]: rng = pd.date_range('1/1/2000', periods=24, freq='H')
In [2]: ts = pd.Series(pd.np.random.randn(len(rng)), index=rng)
In [3]: ts.ix[ts.index.indexer_between_time(datetime.time(10), datetime.time(14))]
Out[3]:
2000-01-01 10:00:00 1.312561
2000-01-01 11:00:00 -1.308502
2000-01-01 12:00:00 -0.515339
2000-01-01 13:00:00 1.536540
2000-01-01 14:00:00 0.108617
Note: the same syntax (using ix
) works for a DataFrame:
In [4]: df = pd.DataFrame(ts)
In [5]: df.ix[df.index.indexer_between_time(datetime.time(10), datetime.time(14))]
Out[5]:
0
2000-01-03 10:00:00 1.312561
2000-01-03 11:00:00 -1.308502
2000-01-03 12:00:00 -0.515339
2000-01-03 13:00:00 1.536540
2000-01-03 14:00:00 0.108617
Pandas drop rows in time series with less than x observation
Use:
#convert index to Series
s = df.index.to_series()
#test if 1 Minute difference, then cumulative sum
a = s.diff().ne(pd.Timedelta(1, unit='Min')).cumsum()
#filter if counts of cumulative value greater like N, e.g. 3
N = 3
df = df[a.map(a.value_counts()).gt(N)]
print (df)
value
timestamp
2018-01-08 06:13:00 143
2018-01-08 06:14:00 324
2018-01-08 06:15:00 324
2018-01-08 06:16:00 324
2018-01-08 06:17:00 324
2018-01-08 06:35:00 324
2018-01-08 06:36:00 324
2018-01-08 06:37:00 324
2018-01-08 06:38:00 324
2018-01-08 06:39:00 324
2018-01-08 06:40:00 324
Delete rows with date's before the required date point based on key value
You can just filter your dataframe using Boolean indexing. There is no groupwise operation here. Just remember to convert your series to datetime
first.
df['date'] = pd.to_datetime(df['date'])
res = df[~(df['date'] < '2018-04-01')]
print(res)
key_value date
2 value_01 2018-04-02
3 value_01 2018-05-13
4 value_01 2018-05-16
7 value_02 2018-04-01
8 value_02 2018-05-16
9 value_02 2018-05-22
11 value_03 2018-04-14
How do I delete rows with same hour and minute in timeseries data based on conditions?
I don't know if I understand it but you can groupby
'Hour','Minute'
and count NaN
in every group and drop
this group
all_groups = df.groupby(['Hour', 'Minute'])
for name, group in all_groups:
count = group['Voltage'].isna().sum()
#print('name:', name, 'count:', count)
if count > 5:
df.drop(group.index, inplace=True)
Dropping dataframe rows in time series dataframe using pandas
Use Series.ne
+ Series.shift
along with optional parameter fill_value
to create a boolean mask
, use this mask to filter/drop the rows:
mask = df['id'].ne(404) & df['id'].shift(fill_value=404).ne(404)
df = df[~mask]
Result:
print(df)
id start end duration
0 303 2012-06-25 17:59:43 2012-06-25 18:01:29 105
1 404 2012-06-25 18:01:29 2012-06-25 18:01:55 25
2 303 2012-06-25 18:01:56 2012-06-25 18:02:06 10
4 404 2012-06-25 18:02:45 2012-06-25 18:02:51 6
5 303 2012-06-25 18:02:54 2012-06-25 18:03:17 23
6 404 2012-06-25 18:03:24 2012-06-25 18:03:41 17
7 303 2012-06-25 18:03:43 2012-06-25 18:05:51 128
9 404 2012-06-25 18:24:24 2012-06-25 18:25:25 61
10 101 2012-06-25 18:25:25 2012-06-25 18:25:46 21
11 404 2012-06-25 18:25:49 2012-06-25 18:26:00 11
12 101 2012-06-25 18:26:01 2012-06-25 18:26:04 3
13 404 2012-06-25 18:26:05 2012-06-25 18:28:49 164
14 202 2012-06-25 18:28:52 2012-06-25 18:28:57 5
15 404 2012-06-25 18:29:00 2012-06-25 18:29:24 24
Related Topics
How to Find Duration Between Two Time Difference in Python Dataframe
How to Index a Middle Character in a List in Python
Programme to Print Mulitples of 5 in a Range Specified by User
How to Get the Response Json Data from Network Call in Xhr Using Python Selenium Web Driver Chorme
Print the Student Name and the Score of Student in Python3
Add Numpy Array as Column to Pandas Data Frame
Make a Batch File Run a Python Code With Arguments
Find the Item With Maximum Occurrences in a List
How to Ask a Set of Questions Multiple Times Based on User Input
How to Fill Empty Cell Value in Pandas With Condition
How to Open a Password Protected Excel File Using Python
Python - Having Trouble Opening a File With Spaces
How to Remove \N from a List Element
How to Suppress Scientific Notation When Printing Float Values
Pandas Get the Most Frequent Values of a Column
Deleting Rows from CSV Based on Cell Contents from Another Csv
How to Limit a Number to Be Within a Specified Range (Python)