How do I find closest value to an input number in pandas column / series?
Here's your logic to code, not quite one line though:
# distance to 5
s=df['B'].sub(5).abs()
# groupby 'A' and find the min
df['C'] = s == s.groupby(df['A']).transform('min')
Output:
A B C
0 C1 1 False
1 C1 3 False
2 C1 6 True
3 C1 9 False
4 C2 2 False
5 C2 3 False
6 C2 4 True
7 C2 8 False
Pandas find the nearest value for in a column
Convert column year
to index, then subtract value, get absolute values and last index (here year) by nearest value - here minimal by DataFrame.idxmin
:
val = 830000
s = df.set_index('year').sub(val).abs().idxmin()
print (s)
pop1 2
pop2 1
dtype: int64
Find row closest value to input
Subtract value by sub
, get absolute values by abs
, find index with minimal value by idxmin
and last select by loc
:
idx = df['delta_n'].sub(delta_n).abs().idxmin()
#added double [[]] for one row DataFrame
df1 = df.loc[[idx]]
print (df1)
delta_n column_1
1 20 0
#output Series with one []
s = df.loc[idx]
print (s)
delta_n 20
column_1 0
Name: 1, dtype: int64
Details:
print (df['delta_n'].sub(delta_n))
0 -10.5
1 -0.5
2 9.5
Name: delta_n, dtype: float64
print (df['delta_n'].sub(delta_n).abs())
0 10.5
1 0.5
2 9.5
Name: delta_n, dtype: float64
print (df['delta_n'].sub(delta_n).abs().idxmin())
1
Another numpy solution for positions by numpy.argmin
and selecting by iloc
:
pos = df['delta_n'].sub(delta_n).abs().values.argmin()
print (pos)
1
df1 = df.loc[[pos]]
print (df1)
delta_n column_1
1 20 0
s = df.loc[pos]
print (s)
delta_n 20
column_1 0
Name: 1, dtype: int64
Find the closest values in a sorted pandas dataframe to values in a list
You can use broadcasting
in numpy to obtain the differences and then get the index conaininng the minimum absolute value
a = np.array([4,7.6,10]).reshape(1,-1) #np.array([[4,7.6,10]])
df.iloc[abs(df.col1.to_numpy()[:,None] - a).argmin(0)]
idx col1 col2
1 2 3.0 22
4 5 7.5 6
6 7 10.1 11
How to find k nearest values in a pandas data frame column to an input value x in O(logn)?
You can easily find the k smallest values with .nsmallest()
, and the closest values are the ones with the smallest absolute difference:
>>> (df1['A'] - 0.21).abs().nsmallest(10)
969 0.000014
889 0.000442
779 0.003299
259 0.003637
843 0.003700
84 0.003818
651 0.004264
403 0.004360
648 0.004421
543 0.005088
Name: A, dtype: float64
You can then reuse the indexes of this if you want to access the matching rows:
>>> df1.loc[(df1['A'] - 0.21).abs().nsmallest(10).index]
A ID
969 0.210014 237
889 0.210442 225
779 0.206701 127
259 0.213637 883
843 0.206300 330
84 0.206182 17
651 0.205736 64
403 0.205640 388
648 0.214421 964
543 0.204912 616
Note that the doc of nsmallest
says:
Faster than .sort_values().head(n) for small n relative to the size of the Series object.
A word on complexity, since your values aren’t sorted:
- the bare minimum complexity is
O(n)
if you want to find the 1 closest value - you could do a binary-search-like to get
O(log(n))
, but that requires sorting first − so it’s in factO(n log(n))
.
Suppose your dataframe is sorted on A:
>>> df1.sort_values('A', inplace=True)
Then we can try to use the sorted search function, which returns the row number (not index value):
>>> df1['A'].searchsorted(0.21)
197
This means we can use that to find the k
closest candidate and then use our previous method on this 2k
dataframe:
def find_closest(df, val, k):
return df.loc[df['A'].sub(val).abs().nsmallest(k).index]
def find_closest_sorted(df, val, k):
closest = df['A'].searchsorted(val)
if closest < k:
return find_closest(df.iloc[:closest + k], val, k)
return find_closest(df.iloc[closest - k:closest + k], val, k)
>>> find_closest_sorted(df1, 0.21, 10)
A ID
969 0.210014 237
889 0.210442 225
779 0.206701 127
259 0.213637 883
843 0.206300 330
84 0.206182 17
651 0.205736 64
403 0.205640 388
648 0.214421 964
543 0.204912 616
The complexity should be here:
O(n log(n))
for sorting (which can be amortized over many lookups)O(log(n))
for the sorted searchO(k)
for the final step.
replace a pandas series with the closest values from another series
Setup
A = pd.Series([1.3, 4.5, 10.11])
B = pd.Series([0.8, 5.1, 10.1, 0.3])
Option 1
Use pd.Series.searchsorted
This searches through A
for each element of B
and finds where in A
that element of B
should be inserted.
A.iloc[A.searchsorted(B)]
0 1.30
2 10.11
2 10.11
0 1.30
dtype: float64
Option 2
But to get at the nearest, you could hack the pd.Series.reindex
method.
pd.Series(A.values, A.values).reindex(B.values, method='nearest')
0.8 1.30
5.1 4.50
10.1 10.11
0.3 1.30
dtype: float64
Identifying closest value in a column for each filter using Pandas
You can create a column of absolute differences:
df['dif'] = (df['values'] - 2).abs()
df
Out:
category values dif
0 a 1 1
1 b 2 0
2 b 3 1
3 b 4 2
4 c 5 3
5 a 4 2
6 b 3 1
7 c 2 0
8 c 1 1
9 a 0 2
And then use groupby.transform
to check whether the minimum value of each group is equal to the difference you calculated:
df['is_closest'] = df.groupby('category')['dif'].transform('min') == df['dif']
df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False
df.groupby('category')['dif'].idxmin()
would also give you the indices of the closest values for each category. You can use that for mapping too.
For selection:
df.loc[df.groupby('category')['dif'].idxmin()]
Out:
category values dif
0 a 1 1
1 b 2 0
7 c 2 0
For assignment:
df['is_closest'] = False
df.loc[df.groupby('category')['dif'].idxmin(), 'is_closest'] = True
df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False
The difference between these approaches is that if you check equality against the difference, you would get True for all rows in case of ties. However, with idxmin
it will return True for the first occurrence (only one for each group).
Related Topics
How to Crop the Black Background of the Image Using Opencv in Python
Why Calling .Sort() Function on Pandas Series Sorts Its Values In-Place and Returns Nothing
How to Share Data Between a Parent and Forked Child Process in Python
Populating a List in Python Using for Loop
Python Calculate Distance Closest Xy Points
Xlsxwriter: How to Open an Existing Worksheet in My Workbook
How to Calculate Range Between the Dataframe Values Using Python
Calculate Sklearn.Roc_Auc_Score for Multi-Class
Python Login Script; Usernames and Passwords in a Separate File
Anaconda Installed But Cannot Launch Navigator
How to Tell Python to Convert Integers into Words
Unit Testing a Method With No Return Value
Ssl: Certificate_Verify_Failed With Python3
Could Not Find a Version That Satisfies the Requirement in Python
How to Redeem Nitro Gifts Automatically With Discord.Py (Self-Bot)