Selecting only numeric columns from a data frame
EDIT: updated to avoid use of ill-advised sapply
.
Since a data frame is a list we can use the list-apply functions:
nums <- unlist(lapply(x, is.numeric), use.names = FALSE)
Then standard subsetting
x[ , nums]
## don't use sapply, even though it's less code
## nums <- sapply(x, is.numeric)
For a more idiomatic modern R I'd now recommend
x[ , purrr::map_lgl(x, is.numeric)]
Less codey, less reflecting R's particular quirks, and more straightforward, and robust to use on database-back-ended tibbles:
dplyr::select_if(x, is.numeric)
Newer versions of dplyr, also support the following syntax:
x %>% dplyr::select(where(is.numeric))
Pandas way to do arithmetic operations on only numeric columns
You can find the numeric columns using select_dtypes
:
s = df.select_dtypes("number").columns
df[s] *= 100
print (df)
name1 val1 name2 val21 val22
0 a 100 aa 100 500
1 b 200 bb 200 600
2 c 300 cc 300 700
R: how to select only continuous numeric columns
There is a library schoolmath
with is.decimal
and is.whole
functions:
library(schoolmath)
x <- c(1, 1.5)
any(is.decimal(x))
TRUE
So you could process to your dataframe with apply
:
decimal_cols <- apply(df, 2, function(x) any(is.decimal(x))
The index values of the returned TRUEs will be the columns with decimal values.
Select a numeric columns of a dataframe in a list
It sounds like you want to subset each data.frame in a list of data.frames to their numeric columns.
You can test which columns of a data.frame called df
are numeric with
sapply(df, is.numeric)
This returns a logical vector, which can be used to subset your data.frame like this:
df[sapply(df, is.numeric)]
Returning the numeric columns of that data.frame. To do this over a list of data.frames df_list
and return a list of subsetted data.frames:
lapply(df_list, function(df) df[sapply(df, is.numeric)])
Edit: Thanks @Richard Scriven for simplifying suggestion.
How to check if a pandas dataframe contains only numeric column wise?
You can check that using to_numeric
and coercing errors:
pd.to_numeric(df['column'], errors='coerce').notnull().all()
For all columns, you can iterate through columns or just use apply
df.apply(lambda s: pd.to_numeric(s, errors='coerce').notnull().all())
E.g.
df = pd.DataFrame({'col' : [1,2, 10, np.nan, 'a'],
'col2': ['a', 10, 30, 40 ,50],
'col3': [1,2,3,4,5.0]})
Outputs
col False
col2 False
col3 True
dtype: bool
extract only numeric columns from data frame
How about
new_df <- df[sapply(df,is.numeric)]
?
how to get numeric column names in pandas dataframe
Use select_dtypes
with np.number
for select all numeric columns:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7.4,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})
print (df)
A B C D E
0 a 4.5 7.4 1 a
1 b 5.0 8.0 3 a
2 c 4.0 9.0 5 a
3 d 5.0 4.0 7 b
4 e 5.0 2.0 1 b
5 f 4.0 3.0 0 b
print (df.dtypes)
A object
B float64
C float64
D int64
E object
dtype: object
cols = df.select_dtypes([np.number]).columns
print (cols)
Index(['B', 'C', 'D'], dtype='object')
Here is possible specify float64
and int64
:
df = pd.DataFrame({'A':list('abcdef'),
'B':[4.5,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':list('aaabbb')})
df['D'] = df['D'].astype(np.int32)
print (df.dtypes)
A object
B float64
C int64
D int32
E object
dtype: object
cols = df.select_dtypes([np.int64,np.float64]).columns
print (cols)
Index(['B', 'C'], dtype='object')
Related Topics
Calculate Cumulative Sum (Cumsum) by Group
Reasons For Using the Set.Seed Function
Change Variable Name in For Loop Using R
Unordered Combinations of All Lengths
Extract Month and Year from a Zoo::Yearmon Object
How to Subtract/Add Days From/To a Date
Sample Random Rows Within Each Group in a Data.Table
Difference: "Compile Pdf" Button in Rstudio Vs. Knit() and Knit2Pdf()
What Does the Dot Mean in R - Personal Preference, Naming Convention or More
Selecting Only Numeric Columns from a Data Frame
Create a Variable Name With "Paste" in R
How to Convert Long to Wide Format With Counts
How to Put Labels Over Geom_Bar For Each Bar in R With Ggplot2
Combining Paste() and Expression() Functions in Plot Labels