R's Which() and Which.Min() Equivalent in Python

R's which() and which.min() Equivalent in Python

Numpy does have built-in functions for it

x = [1,2,3,4,0,1,2,3,4,11] 
x=np.array(x)
np.where(x == 2)
np.min(np.where(x==2))
np.argmin(x)

np.where(x == 2)
Out[9]: (array([1, 6], dtype=int64),)

np.min(np.where(x==2))
Out[10]: 1

np.argmin(x)
Out[11]: 4

What is the equivalent of python's idxmin() in R?

which.min() is R's the equivalent of idxmin(). Both find the minimum value in an array and return the index of the first such value - useful if there are ties.

Pandas Equivalent of R's which()

I may not understand clearly the question, but it looks like the response is easier than what you think:

using pandas DataFrame:

df['colname'] > somenumberIchoose

returns a pandas series with True / False values and the original index of the DataFrame.

Then you can use that boolean series on the original DataFrame and get the subset you are looking for:

df[df['colname'] > somenumberIchoose]

should be enough.

See http://pandas.pydata.org/pandas-docs/stable/indexing.html#boolean-indexing

R equivalent of Python's range() function

There is no exact equivalent. As noted, seq doesn't work because the by argument is automatically set with the correct sign, and generates an error if you try to explicitly pass a positive sign when to < from. Just create a very simple wrapper if you need to have the exact match.

py_range <- function(from, to) {
  if (to <= from) return(integer(0))
  seq(from = from, to = to - 1)
}

py_range(1, 4)
#> [1] 1 2 3
py_range(1, 0)
#> integer(0)
py_range(1, 1)
#> integer(0)

These will work in a loop with printing as you desire.

for (i in py_range(1, 4)) {
  print(i)
}
#> [1] 1
#> [1] 2
#> [1] 3

for (i in py_range(1, 0)) {
  print(i)
}
#> Nothing was actually printed here!

for (i in py_range(1, 1)) {
  print(i)
}
#> Nothing was actually printed here!

R `summary` function closest equivalent in python

Without pandas:

from scipy import stats
import numpy as np

a = np.random.rand(100,3)
summary = stats.describe(a, axis = 0)

print(summary.mean)
print(summary.minmax)
...

Using pandas:

import pandas as pd

summary_across_rows = pd.DataFrame(a).describe() # across axis=0
print(summary)
                0           1           2
count  100.000000  100.000000  100.000000
mean     0.495204    0.573827    0.476202
std      0.275131    0.246189    0.271626
min      0.005202    0.037195    0.023595
25%      0.295210    0.399358    0.258712
50%      0.512023    0.562181    0.417322
75%      0.710216    0.790970    0.712047
max      0.998371    0.997717    0.980840

Note: for the summary across the other dimension you need:
summary_across_columns = pd.DataFrame(a.T).describe() # across axis=1

What are Python pandas equivalents for R functions like str(), summary(), and head()?

summary() ~ describe()
head() ~ head()

I'm not sure about the str() equivalent.

R equivalent of performing operations on an empty list in Python?

The equivalent of Python S1 = []; S1.append(x) in R is S1 <- list(); S1 <- c(S1, list(x)) in R.

In your example c(S1, x) will work because the numeric value you are trying to append will be automatically wrapped in a list, but it's safer to do it explicitly. If x is already a list, then c(S1, x) will append its elements to S1, while c(S1, list(x)) will append a single entry containing a copy of x to S1.

You could use the append() function in R, but then remember that it's rare for R functions to modify their arguments, so you would write

S1 <- append(S1, list(x))

In this situation it's essentially identical to c().

What is the equivalent of R's lm function for fitting simple linear regressions in python?

Use OLS implementation from statsmodels and its .summary attribute, don't forget to add constant manually using add_constant since it's not added by default.

import statsmodels.api as sm

reg = sm.OLS(y, sm.add_constant(X)).fit()
reg.summary

R's Which() and Which.Min() Equivalent in Python