apply() not working when checking column class in a data.frame
You could use
sapply(xdat, class)
# x1 x2
# "integer" "factor"
using apply
would coerce the output to matrix
and matrix can hold only a single 'class'. If there are 'character' columns, the result would be a single 'character' class. To understand this check
str(apply(xdat, 2, I))
#chr [1:20, 1:2] "1" "2" "3" "4" "1" "2" "3" "4" "1" ...
#- attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "x1" "x2"
Now, if we check
str(lapply(xdat, I))
#List of 2
#$ x1:Class 'AsIs' int [1:20] 1 2 3 4 1 2 3 4 1 2 ...
#$ x2: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 1 2 3 4 1 2 ...
Checking class of all columns in data.frame
You are looking for lapply(diamonds, class)
Also apply
still worked , but the result is not right, it will return all type to character
. look into the link
apply
works on arrays/matrices
, not data.frames
.
when you using it in data.frame
it will convert to matrix
.
Applying a function to a dataframe does not work
TL;DR:
What you are looking for is .applymap()
Details:
Your method is actually written well and can be used in .apply()
as-is, for a pandas.Series
object, but I assume that if you are experiencing issues, it is due to the fact that you are probably using it for a pandas.DataFrame
, against multiple columns.
In such a case, the argument passed to num_repair
is actually of type pandas.Series
, which num_repair
is not really meant to support.
I can only assume, since the code that uses num_repair
isn't given. Consider adding it for the completeness of the question.
If so, you can use it as follows:
df = pd.DataFrame([
['1M', '1B', '1TR'],
['22M', '22B', '22TR'],
], columns=[1990, 1991, 1992])
df.applymap(num_repair)
output:
1990 1991 1992
0 1000000 1000000000 1000000000000
1 22000000 22000000000 22000000000000
Side Note
If you want to apply it to all columns except the country, since the name may contain B
/ TR
/ M
- you can do the following:
df = pd.DataFrame([
['countryM', '1M', '1B', '1TR'],
['countryB', '22M', '22B', '22TR'],
], columns=['country', 1990, 1991, 1992])
df.loc[:, df.columns.drop('country')] = df.loc[:, df.columns.drop('country')].applymap(num_repair)
df
output:
country 1990 1991 1992
0 countryM 1000000 1000000000 1000000000000
1 countryB 22000000 22000000000 22000000000000
Why do columns in data frames change class when subsetted versus apply?
You can use lapply or sapply function to know your class of variables, to my understanding apply goes through column element wise so each element is a character so the output shows as as character, where as lapply and sapply functions works on variables so it gives class of variables either its as character or factor
lapply(df,class)
$a
[1] "factor"
$b
[1] "factor"
sapply(df,class)
a b
"factor" "factor"
Why is.factor() used in apply() and sapply() returns different values?
From the reference of apply
:
Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.
Therefore, it converts your input object to a matrix (array) first which must have the same atomic data type. This means that your data get coerced to character
, because factor
is not an atomic vector type.
> as.matrix(X)
X1 X2
[1,] "1" "f1"
[2,] "2" "f2"
[3,] "3" "f3"
[4,] "4" "f4"
Checking data type for each variable in a R data frame
apply
changes the data to matrix first and since matrix can hold data of only one type if the dataframe has mixed class (numeric, character) it changes the numeric columns to character values thus returning FALSE
for is.numeric
.
Here's an example to demonstrate what you are observing.
DF <- data.frame(a = 1:5, b = letters[1:5])
apply(DF, 2, is.numeric)
# a b
#FALSE FALSE
sapply(DF, is.numeric)
# a b
# TRUE FALSE
In contrast, if all the columns of dataframe is numeric apply
will return TRUE
.
DF <- data.frame(a = 1:5, b = 1:5)
apply(DF, 2, is.numeric)
# a b
#TRUE TRUE
How to find out the classes of a data.frame
apply
doesn't work for you because, as in the docs:
If ‘X’ is not an array but an object of a class with a non-null
‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it
to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
frame) or via ‘as.array’.
so your data frame becomes a matrix with the column classes set to the simplest possible class that can represent your columns - in this case a character matrix:
> as.matrix(Example)
Col1 Col2 Col3
[1,] " 2" "Hello" " TRUE"
[2,] " 5" "I am a" "FALSE"
[3,] "10" "Factor" " TRUE"
Use sapply
> sapply(Example,class)
Col1 Col2 Col3
"numeric" "factor" "logical"
Related Topics
How to Read Geojson or Topojson File in R to Draw a Choropleth Map
In R Plotly Subplot Graph, How to Show Only One Legend
Existing Function for Seeing If a Row Exists in a Data Frame
Sequence Length Encoding Using R
Using R Convert Data.Frame to Simple Vector
Skip Some Rows in Read.CSV in R
How to Get the Nth Element of Each Item of a List, Which Is Itself a Vector of Unknown Length
Extract the Coefficients for the Best Tuning Parameters of a Glmnet Model in Caret
Calculating Peaks in Histograms or Density Functions
Deleting Specific Rows from a Data Frame
Identifying the Outliers in a Data Set in R
Rcpp Function to Select (And to Return) a Sub-Dataframe
How to Convert Certain Columns Only to Numeric
Connect to Redshift via Ssl Using R
How to Add Shaded Confidence Intervals to Line Plot with Specified Values