Selecting Only Numeric Columns from a Data Frame

Selecting only numeric columns from a data frame

EDIT: updated to avoid use of ill-advised sapply.

Since a data frame is a list we can use the list-apply functions:

nums <- unlist(lapply(x, is.numeric), use.names = FALSE)  

Then standard subsetting

x[ , nums]

## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)

For a more idiomatic modern R I'd now recommend

x[ , purrr::map_lgl(x, is.numeric)]

Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:

dplyr::select_if(x, is.numeric)

Newer versions of dplyr, also support the following syntax:

x %>% dplyr::select(where(is.numeric))

Pandas way to do arithmetic operations on only numeric columns

You can find the numeric columns using select_dtypes:

s = df.select_dtypes("number").columns
df[s] *= 100

print (df)

name1 val1 name2 val21 val22
0 a 100 aa 100 500
1 b 200 bb 200 600
2 c 300 cc 300 700

R: how to select only continuous numeric columns

There is a library schoolmath with is.decimal and is.whole functions:

library(schoolmath)
x <- c(1, 1.5)
any(is.decimal(x))
TRUE

So you could process to your dataframe with apply:

decimal_cols <- apply(df, 2, function(x) any(is.decimal(x))

The index values of the returned TRUEs will be the columns with decimal values.

Select a numeric columns of a dataframe in a list

It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.

You can test which columns of a data.frame called df are numeric with

sapply(df, is.numeric)

This returns a logical vector, which can be used to subset your data.frame like this:

df[sapply(df, is.numeric)]

Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list and return a list of subsetted data.frames:

lapply(df_list, function(df) df[sapply(df, is.numeric)])

Edit: Thanks @Richard Scriven for simplifying suggestion.

How to check if a pandas dataframe contains only numeric column wise?

You can check that using to_numeric and coercing errors:

pd.to_numeric(df['column'], errors='coerce').notnull().all()

For all columns, you can iterate through columns or just use apply

df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())

E.g.

df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'], 
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})

Outputs

col     False
col2 False
col3 True
dtype: bool

extract only numeric columns from data frame

How about

new_df <- df[sapply(df,is.numeric)]

?

how to get numeric column names in pandas dataframe

Use select_dtypes with np.number for select all numeric columns:

df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7.4,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})

print (df)
A B C D E
0 a 4.5 7.4 1 a
1 b 5.0 8.0 3 a
2 c 4.0 9.0 5 a
3 d 5.0 4.0 7 b
4 e 5.0 2.0 1 b
5 f 4.0 3.0 0 b

print (df.dtypes)
A object
B float64
C float64
D int64
E object
dtype: object

cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')

Here is possible specify float64 and int64:

df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})

df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A object
B float64
C int64
D int32
E object
dtype: object

cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')


Related Topics



Leave a reply



Submit