Get the Min of Two Columns

Get the min of two columns

You want the parallel minimum implemented in function pmin(). For example using your data:

dat <- read.table(text = "ID    Parm1   Parm2
1 1 2
2 0 1
3 2 1
4 1 0
5 2 0", header = TRUE)

you can use transform() to add the min column as the output of pmin(Parm1, Parm2) and access the elements of dat without indexing:

dat <- transform(dat, min = pmin(Parm1, Parm2))

This gives:

> dat
ID Parm1 Parm2 min
1 1 1 2 1
2 2 0 1 0
3 3 2 1 1
4 4 1 0 0
5 5 2 0 0

What's the best way to select the minimum value from several columns?

There are likely to be many ways to accomplish this. My suggestion is to use Case/When to do it. With 3 columns, it's not too bad.

Select Id,
Case When Col1 < Col2 And Col1 < Col3 Then Col1
When Col2 < Col1 And Col2 < Col3 Then Col2
Else Col3
End As TheMin
From YourTableNameHere

pandas get the row-wise minimum value of two or more columns

If you are trying to get the row-wise mininum of two or more columns, use pandas.DataFrame.min. Note that by default axis=0; specifying axis=1 is necessary.

data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)

# display(data)
flow_c flow_d flow_h min_c_h
0 82 36 43 43
1 52 48 12 12
2 33 28 77 33
3 91 99 11 11
4 44 95 27 27
5 5 94 64 5
6 98 3 88 88
7 73 39 92 73
8 26 39 62 26
9 56 74 50 50

How to find a minimum of two columns based on a condition?

One possibility

library(dplyr)
goodrow <- filter(df, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))

If I understand well your question, consider using the crossing function. This will compute all the combination of ID1 and ID2

library(dplyr)

df <- as.data.frame(cbind(ID1,Value1))
df2 <- as.data.frame(cbind(ID2,Value2))
df_test <- crossing(df, df2)

goodrow <- filter(df_test, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))

Min and Max across multiple columns with NAs

You can use hablar's min_ and max_ function which returns NA if all values are NA.

library(dplyr)
library(hablar)

dat %>%
rowwise() %>%
mutate(min = min_(c_across(-ID)),
max = max_(c_across(-ID)))

You can also use this with apply -

cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))

# ID PM TP2 Sigma min max
#1 1 1 2 3 1 3
#2 2 0 NA 1 0 1
#3 3 2 1 NA 1 2
#4 4 1 0 2 0 2
#5 NA NA NA NA NA NA
#6 5 2 0 7 0 7

Pandas: get the min value between 2 dataframe columns

Use df.min(axis=1)

df['c'] = df.min(axis=1)
df
Out[41]:
A B c
0 2 1 1
1 2 1 1
2 2 4 2
3 2 4 2
4 3 5 3
5 3 5 3
6 3 6 3
7 3 6 3

This returns the min row-wise (when passing axis=1)

For non-heterogenous dtypes and large dfs you can use numpy.min which will be quicker:

In[42]:
df['c'] = np.min(df.values,axis=1)
df

Out[42]:
A B c
0 2 1 1
1 2 1 1
2 2 4 2
3 2 4 2
4 3 5 3
5 3 5 3
6 3 6 3
7 3 6 3

timings:

In[45]:
df = pd.DataFrame({'A': [2, 2, 2, 2, 3, 3, 3, 3],
'B': [1, 1, 4, 4, 5, 5, 6, 6]})
df = pd.concat([df]*1000, ignore_index=True)
df.shape

Out[45]: (8000, 2)

So for a 8K row df:

%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
314 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
34.4 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

You can see that the numpy version is nearly 10x quicker (note I pass df.values so we pass a numpy array), this will become more of a factor when we get to even larger dfs

Note

for versions 0.24.0 or greater, use to_numpy()

so the above becomes:

df['c'] = np.min(df.to_numpy(),axis=1)

Timings:

%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
%timeit np.min(df.to_numpy(),axis=1)
314 µs ± 3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
35.2 µs ± 680 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.5 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

There is a minor discrepancy between .values and to_numpy(), it depends on whether you know upfront that the dtype is not mixed, and that the likely dtype is a factor e.g. float 16 vs float 32 see that link for further explanation. Pandas is doing a little more checking when calling to_numpy

pandas - get minimum value between two columns and assign if two columns are not null

If you only want to calc min() when neither are null / NaN this does it.

df = pd.read_csv(io.StringIO("""col1    col2    col3
347 933 338
500 NaN 200
938 523 211"""), sep="\s+")

df = df.assign(
tempCol=lambda dfa: np.where(dfa["col2"].isna()|dfa["col3"].isna(),
np.nan,
dfa.loc[:,["col2","col3"]].min(axis=1))
)

output

   col1   col2  col3  tempCol
0 347 933.0 338 338.0
1 500 NaN 200 NaN
2 938 523.0 211 211.0


Related Topics



Leave a reply



Submit