Get the Min of Two Columns

Get the min of two columns

You want the parallel minimum implemented in function pmin(). For example using your data:

dat <- read.table(text = "ID    Parm1   Parm2
 1      1       2
 2      0       1
 3      2       1
 4      1       0
 5      2       0", header = TRUE)

you can use transform() to add the min column as the output of pmin(Parm1, Parm2) and access the elements of dat without indexing:

dat <- transform(dat, min = pmin(Parm1, Parm2))

This gives:

> dat
  ID Parm1 Parm2 min
1  1     1     2   1
2  2     0     1   0
3  3     2     1   1
4  4     1     0   0
5  5     2     0   0

What's the best way to select the minimum value from several columns?

There are likely to be many ways to accomplish this. My suggestion is to use Case/When to do it. With 3 columns, it's not too bad.

Select Id,
       Case When Col1 < Col2 And Col1 < Col3 Then Col1
            When Col2 < Col1 And Col2 < Col3 Then Col2 
            Else Col3
            End As TheMin
From   YourTableNameHere

pandas get the row-wise minimum value of two or more columns

If you are trying to get the row-wise mininum of two or more columns, use pandas.DataFrame.min. Note that by default axis=0; specifying axis=1 is necessary.

data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)

# display(data)
   flow_c  flow_d  flow_h  min_c_h
0      82      36      43       43
1      52      48      12       12
2      33      28      77       33
3      91      99      11       11
4      44      95      27       27
5       5      94      64        5
6      98       3      88       88
7      73      39      92       73
8      26      39      62       26
9      56      74      50       50

How to find a minimum of two columns based on a condition?

One possibility

library(dplyr)
goodrow <- filter(df, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))

If I understand well your question, consider using the crossing function. This will compute all the combination of ID1 and ID2

library(dplyr)

df <- as.data.frame(cbind(ID1,Value1))
df2 <- as.data.frame(cbind(ID2,Value2))
df_test <- crossing(df, df2)

goodrow <- filter(df_test, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))

Min and Max across multiple columns with NAs

You can use hablar's min_ and max_ function which returns NA if all values are NA.

library(dplyr)
library(hablar)

dat %>%
  rowwise() %>%
  mutate(min = min_(c_across(-ID)), 
         max = max_(c_across(-ID)))

You can also use this with apply -

cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))

#  ID PM TP2 Sigma min max
#1  1  1   2     3   1   3
#2  2  0  NA     1   0   1
#3  3  2   1    NA   1   2
#4  4  1   0     2   0   2
#5 NA NA  NA    NA  NA  NA
#6  5  2   0     7   0   7

Pandas: get the min value between 2 dataframe columns

Use df.min(axis=1)

df['c'] = df.min(axis=1)
df
Out[41]: 
   A  B  c
0  2  1  1
1  2  1  1
2  2  4  2
3  2  4  2
4  3  5  3
5  3  5  3
6  3  6  3
7  3  6  3

This returns the min row-wise (when passing axis=1)

For non-heterogenous dtypes and large dfs you can use numpy.min which will be quicker:

In[42]:
df['c'] = np.min(df.values,axis=1)
df

Out[42]: 
   A  B  c
0  2  1  1
1  2  1  1
2  2  4  2
3  2  4  2
4  3  5  3
5  3  5  3
6  3  6  3
7  3  6  3

timings:

In[45]:
df = pd.DataFrame({'A': [2, 2, 2, 2, 3, 3, 3, 3],
                   'B': [1, 1, 4, 4, 5, 5, 6, 6]})
df = pd.concat([df]*1000, ignore_index=True)
df.shape

Out[45]: (8000, 2)

So for a 8K row df:

%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
314 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
34.4 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

You can see that the numpy version is nearly 10x quicker (note I pass df.values so we pass a numpy array), this will become more of a factor when we get to even larger dfs

Note

for versions 0.24.0 or greater, use to_numpy()

so the above becomes:

df['c'] = np.min(df.to_numpy(),axis=1)

Timings:

%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
%timeit np.min(df.to_numpy(),axis=1)
314 µs ± 3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
35.2 µs ± 680 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.5 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)

There is a minor discrepancy between .values and to_numpy(), it depends on whether you know upfront that the dtype is not mixed, and that the likely dtype is a factor e.g. float 16 vs float 32 see that link for further explanation. Pandas is doing a little more checking when calling to_numpy

pandas - get minimum value between two columns and assign if two columns are not null

If you only want to calc min() when neither are null / NaN this does it.

df = pd.read_csv(io.StringIO("""col1    col2    col3
347     933     338
500     NaN     200
938     523     211"""), sep="\s+")

df = df.assign(
    tempCol=lambda dfa: np.where(dfa["col2"].isna()|dfa["col3"].isna(), 
                                 np.nan, 
                                 dfa.loc[:,["col2","col3"]].min(axis=1))
)

output

   col1   col2  col3  tempCol
0   347  933.0   338    338.0
1   500    NaN   200      NaN
2   938  523.0   211    211.0

Get the Min of Two Columns