Selecting Only Numeric Columns from a Data Frame

Selecting only numeric columns from a data frame

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

Pandas way to do arithmetic operations on only numeric columns

You can find the numeric columns using select_dtypes:

s = df.select_dtypes("number").columns
df[s] *= 100

print (df)

  name1  val1 name2  val21  val22
0     a   100    aa    100    500
1     b   200    bb    200    600
2     c   300    cc    300    700

R: how to select only continuous numeric columns

There is a library schoolmath with is.decimal and is.whole functions:

library(schoolmath)
x <- c(1, 1.5)
any(is.decimal(x))
TRUE

So you could process to your dataframe with apply:

decimal_cols <- apply(df, 2, function(x) any(is.decimal(x))

The index values of the returned TRUEs will be the columns with decimal values.

Select a numeric columns of a dataframe in a list

It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.

You can test which columns of a data.frame called df are numeric with

sapply(df, is.numeric)

This returns a logical vector, which can be used to subset your data.frame like this:

df[sapply(df, is.numeric)]

Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list and return a list of subsetted data.frames:

lapply(df_list, function(df) df[sapply(df, is.numeric)])

Edit: Thanks @Richard Scriven for simplifying suggestion.

How to check if a pandas dataframe contains only numeric column wise?

You can check that using to_numeric and coercing errors:

pd.to_numeric(df['column'], errors='coerce').notnull().all()

For all columns, you can iterate through columns or just use apply

df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())

E.g.

df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'], 
                   'col2': ['a', 10, 30, 40 ,50],
                   'col3': [1,2,3,4,5.0]})

Outputs

col     False
col2    False
col3     True
dtype: bool

extract only numeric columns from data frame

How about

new_df <- df[sapply(df,is.numeric)]

how to get numeric column names in pandas dataframe

Use select_dtypes with np.number for select all numeric columns:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4.5,5,4,5,5,4],
                   'C':[7.4,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb')})

print (df)
   A    B    C  D  E
0  a  4.5  7.4  1  a
1  b  5.0  8.0  3  a
2  c  4.0  9.0  5  a
3  d  5.0  4.0  7  b
4  e  5.0  2.0  1  b
5  f  4.0  3.0  0  b

print (df.dtypes)
A     object
B    float64
C    float64
D      int64
E     object
dtype: object

cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')

Here is possible specify float64 and int64:

df = pd.DataFrame({'A':list('abcdef'),
                   'B':[4.5,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':list('aaabbb')})

df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A     object
B    float64
C      int64
D      int32
E     object
dtype: object

cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')

Selecting Only Numeric Columns from a Data Frame