Get the min of two columns
You want the parallel minimum implemented in function pmin()
. For example using your data:
dat <- read.table(text = "ID Parm1 Parm2
1 1 2
2 0 1
3 2 1
4 1 0
5 2 0", header = TRUE)
you can use transform()
to add the min
column as the output of pmin(Parm1, Parm2)
and access the elements of dat
without indexing:
dat <- transform(dat, min = pmin(Parm1, Parm2))
This gives:
> dat
ID Parm1 Parm2 min
1 1 1 2 1
2 2 0 1 0
3 3 2 1 1
4 4 1 0 0
5 5 2 0 0
What's the best way to select the minimum value from several columns?
There are likely to be many ways to accomplish this. My suggestion is to use Case/When to do it. With 3 columns, it's not too bad.
Select Id,
Case When Col1 < Col2 And Col1 < Col3 Then Col1
When Col2 < Col1 And Col2 < Col3 Then Col2
Else Col3
End As TheMin
From YourTableNameHere
pandas get the row-wise minimum value of two or more columns
If you are trying to get the row-wise mininum
of two or more columns, use pandas.DataFrame.min
. Note that by default axis=0
; specifying axis=1
is necessary.
data['min_c_h'] = data[['flow_h','flow_c']].min(axis=1)
# display(data)
flow_c flow_d flow_h min_c_h
0 82 36 43 43
1 52 48 12 12
2 33 28 77 33
3 91 99 11 11
4 44 95 27 27
5 5 94 64 5
6 98 3 88 88
7 73 39 92 73
8 26 39 62 26
9 56 74 50 50
How to find a minimum of two columns based on a condition?
One possibility
library(dplyr)
goodrow <- filter(df, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))
If I understand well your question, consider using the crossing function. This will compute all the combination of ID1 and ID2
library(dplyr)
df <- as.data.frame(cbind(ID1,Value1))
df2 <- as.data.frame(cbind(ID2,Value2))
df_test <- crossing(df, df2)
goodrow <- filter(df_test, ID1 + ID2 <= 9) %>% mutate(sumval = Value1 + Value2) %>% filter(sumval == min(sumval))
Min and Max across multiple columns with NAs
You can use hablar
's min_
and max_
function which returns NA
if all values are NA
.
library(dplyr)
library(hablar)
dat %>%
rowwise() %>%
mutate(min = min_(c_across(-ID)),
max = max_(c_across(-ID)))
You can also use this with apply
-
cbind(dat, t(apply(dat[-1], 1, function(x) c(min = min_(x), max = max_(x)))))
# ID PM TP2 Sigma min max
#1 1 1 2 3 1 3
#2 2 0 NA 1 0 1
#3 3 2 1 NA 1 2
#4 4 1 0 2 0 2
#5 NA NA NA NA NA NA
#6 5 2 0 7 0 7
Pandas: get the min value between 2 dataframe columns
Use df.min(axis=1)
df['c'] = df.min(axis=1)
df
Out[41]:
A B c
0 2 1 1
1 2 1 1
2 2 4 2
3 2 4 2
4 3 5 3
5 3 5 3
6 3 6 3
7 3 6 3
This returns the min row-wise (when passing axis=1
)
For non-heterogenous dtypes and large dfs you can use numpy.min
which will be quicker:
In[42]:
df['c'] = np.min(df.values,axis=1)
df
Out[42]:
A B c
0 2 1 1
1 2 1 1
2 2 4 2
3 2 4 2
4 3 5 3
5 3 5 3
6 3 6 3
7 3 6 3
timings:
In[45]:
df = pd.DataFrame({'A': [2, 2, 2, 2, 3, 3, 3, 3],
'B': [1, 1, 4, 4, 5, 5, 6, 6]})
df = pd.concat([df]*1000, ignore_index=True)
df.shape
Out[45]: (8000, 2)
So for a 8K row df:
%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
314 µs ± 3.63 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
34.4 µs ± 161 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
You can see that the numpy version is nearly 10x quicker (note I pass df.values
so we pass a numpy array), this will become more of a factor when we get to even larger dfs
Note
for versions 0.24.0
or greater, use to_numpy()
so the above becomes:
df['c'] = np.min(df.to_numpy(),axis=1)
Timings:
%timeit df.min(axis=1)
%timeit np.min(df.values,axis=1)
%timeit np.min(df.to_numpy(),axis=1)
314 µs ± 3 µs per loop (mean ± std. dev. of 7 runs, 1000 loops each)
35.2 µs ± 680 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
35.5 µs ± 262 ns per loop (mean ± std. dev. of 7 runs, 10000 loops each)
There is a minor discrepancy between .values
and to_numpy()
, it depends on whether you know upfront that the dtype is not mixed, and that the likely dtype is a factor e.g. float 16
vs float 32
see that link for further explanation. Pandas is doing a little more checking when calling to_numpy
pandas - get minimum value between two columns and assign if two columns are not null
If you only want to calc min()
when neither are null / NaN this does it.
df = pd.read_csv(io.StringIO("""col1 col2 col3
347 933 338
500 NaN 200
938 523 211"""), sep="\s+")
df = df.assign(
tempCol=lambda dfa: np.where(dfa["col2"].isna()|dfa["col3"].isna(),
np.nan,
dfa.loc[:,["col2","col3"]].min(axis=1))
)
output
col1 col2 col3 tempCol
0 347 933.0 338 338.0
1 500 NaN 200 NaN
2 938 523.0 211 211.0
Related Topics
R Partial Reshape Data from Long to Wide
Using If Else Conditions on Vectors
Scale_Color_Manual Colors Won't Change
How to Manually Set Colors in a Bar Chart
Writing R Function with If Enviornment
Correct Positioning of Multiple Significance Labels on Dodged Groups in Ggplot
Parse String with Additional Characters in Format to Date
R Aggregate Data in One Column Based on 2 Other Columns
Weird As.Posixct Behavior Depending on Daylight Savings Time
Merge Two Dataframes If Timestamp of X Is Within Time Interval of Y
R - Common Title and Legend for Combined Plots
How to Get Currency Exchange Rates in R
How to Know If R Is Running on 64 Bits Versus 32
R "Stats" Citation for a Scientific Paper