How to Find the Closest Values in a Pandas Series to an Input Number

How do I find closest value to an input number in pandas column / series?

Here's your logic to code, not quite one line though:

# distance to 5
s=df['B'].sub(5).abs()

# groupby 'A' and find the min
df['C'] = s == s.groupby(df['A']).transform('min')

Output:

    A  B      C
0 C1 1 False
1 C1 3 False
2 C1 6 True
3 C1 9 False
4 C2 2 False
5 C2 3 False
6 C2 4 True
7 C2 8 False

Pandas find the nearest value for in a column

Convert column year to index, then subtract value, get absolute values and last index (here year) by nearest value - here minimal by DataFrame.idxmin:

val = 830000

s = df.set_index('year').sub(val).abs().idxmin()
print (s)
pop1 2
pop2 1
dtype: int64

Find row closest value to input

Subtract value by sub, get absolute values by abs, find index with minimal value by idxmin and last select by loc:

idx = df['delta_n'].sub(delta_n).abs().idxmin()

#added double [[]] for one row DataFrame
df1 = df.loc[[idx]]
print (df1)
delta_n column_1
1 20 0

#output Series with one []
s = df.loc[idx]
print (s)
delta_n 20
column_1 0
Name: 1, dtype: int64

Details:

print (df['delta_n'].sub(delta_n))
0 -10.5
1 -0.5
2 9.5
Name: delta_n, dtype: float64

print (df['delta_n'].sub(delta_n).abs())
0 10.5
1 0.5
2 9.5
Name: delta_n, dtype: float64

print (df['delta_n'].sub(delta_n).abs().idxmin())
1

Another numpy solution for positions by numpy.argmin and selecting by iloc:

pos = df['delta_n'].sub(delta_n).abs().values.argmin()
print (pos)
1

df1 = df.loc[[pos]]
print (df1)
delta_n column_1
1 20 0

s = df.loc[pos]
print (s)
delta_n 20
column_1 0
Name: 1, dtype: int64

Find the closest values in a sorted pandas dataframe to values in a list

You can use broadcasting in numpy to obtain the differences and then get the index conaininng the minimum absolute value

a = np.array([4,7.6,10]).reshape(1,-1) #np.array([[4,7.6,10]])
df.iloc[abs(df.col1.to_numpy()[:,None] - a).argmin(0)]

idx col1 col2
1 2 3.0 22
4 5 7.5 6
6 7 10.1 11

How to find k nearest values in a pandas data frame column to an input value x in O(logn)?

You can easily find the k smallest values with .nsmallest(), and the closest values are the ones with the smallest absolute difference:

>>> (df1['A'] - 0.21).abs().nsmallest(10)
969 0.000014
889 0.000442
779 0.003299
259 0.003637
843 0.003700
84 0.003818
651 0.004264
403 0.004360
648 0.004421
543 0.005088
Name: A, dtype: float64

You can then reuse the indexes of this if you want to access the matching rows:

>>> df1.loc[(df1['A'] - 0.21).abs().nsmallest(10).index]
A ID
969 0.210014 237
889 0.210442 225
779 0.206701 127
259 0.213637 883
843 0.206300 330
84 0.206182 17
651 0.205736 64
403 0.205640 388
648 0.214421 964
543 0.204912 616

Note that the doc of nsmallest says:

Faster than .sort_values().head(n) for small n relative to the size of the Series object.

A word on complexity, since your values aren’t sorted:

  • the bare minimum complexity is O(n) if you want to find the 1 closest value
  • you could do a binary-search-like to get O(log(n)), but that requires sorting first − so it’s in fact O(n log(n)).

Suppose your dataframe is sorted on A:

>>> df1.sort_values('A', inplace=True)

Then we can try to use the sorted search function, which returns the row number (not index value):

>>> df1['A'].searchsorted(0.21)
197

This means we can use that to find the k closest candidate and then use our previous method on this 2k dataframe:

def find_closest(df, val, k):
return df.loc[df['A'].sub(val).abs().nsmallest(k).index]

def find_closest_sorted(df, val, k):
closest = df['A'].searchsorted(val)
if closest < k:
return find_closest(df.iloc[:closest + k], val, k)

return find_closest(df.iloc[closest - k:closest + k], val, k)
>>> find_closest_sorted(df1, 0.21, 10)
A ID
969 0.210014 237
889 0.210442 225
779 0.206701 127
259 0.213637 883
843 0.206300 330
84 0.206182 17
651 0.205736 64
403 0.205640 388
648 0.214421 964
543 0.204912 616

The complexity should be here:

  • O(n log(n)) for sorting (which can be amortized over many lookups)
  • O(log(n)) for the sorted search
  • O(k) for the final step.

replace a pandas series with the closest values from another series

Setup

A = pd.Series([1.3, 4.5, 10.11])
B = pd.Series([0.8, 5.1, 10.1, 0.3])

Option 1

Use pd.Series.searchsorted

This searches through A for each element of B and finds where in A that element of B should be inserted.

A.iloc[A.searchsorted(B)]

0 1.30
2 10.11
2 10.11
0 1.30
dtype: float64

Option 2

But to get at the nearest, you could hack the pd.Series.reindex method.

pd.Series(A.values, A.values).reindex(B.values, method='nearest')

0.8 1.30
5.1 4.50
10.1 10.11
0.3 1.30
dtype: float64

Identifying closest value in a column for each filter using Pandas

You can create a column of absolute differences:

df['dif'] = (df['values'] - 2).abs()

df
Out:
category values dif
0 a 1 1
1 b 2 0
2 b 3 1
3 b 4 2
4 c 5 3
5 a 4 2
6 b 3 1
7 c 2 0
8 c 1 1
9 a 0 2

And then use groupby.transform to check whether the minimum value of each group is equal to the difference you calculated:

df['is_closest'] = df.groupby('category')['dif'].transform('min') == df['dif']

df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False

df.groupby('category')['dif'].idxmin() would also give you the indices of the closest values for each category. You can use that for mapping too.

For selection:

df.loc[df.groupby('category')['dif'].idxmin()]
Out:
category values dif
0 a 1 1
1 b 2 0
7 c 2 0

For assignment:

df['is_closest'] = False
df.loc[df.groupby('category')['dif'].idxmin(), 'is_closest'] = True
df
Out:
category values dif is_closest
0 a 1 1 True
1 b 2 0 True
2 b 3 1 False
3 b 4 2 False
4 c 5 3 False
5 a 4 2 False
6 b 3 1 False
7 c 2 0 True
8 c 1 1 False
9 a 0 2 False

The difference between these approaches is that if you check equality against the difference, you would get True for all rows in case of ties. However, with idxmin it will return True for the first occurrence (only one for each group).



Related Topics



Leave a reply



Submit