Apply() Not Working When Checking Column Class in a Data.Frame

apply() not working when checking column class in a data.frame

You could use

sapply(xdat, class)
# x1 x2
# "integer" "factor"

using apply would coerce the output to matrix and matrix can hold only a single 'class'. If there are 'character' columns, the result would be a single 'character' class. To understand this check

 str(apply(xdat, 2, I))
#chr [1:20, 1:2] "1" "2" "3" "4" "1" "2" "3" "4" "1" ...
#- attr(*, "dimnames")=List of 2
# ..$ : NULL
# ..$ : chr [1:2] "x1" "x2"

Now, if we check

 str(lapply(xdat, I))
#List of 2
#$ x1:Class 'AsIs' int [1:20] 1 2 3 4 1 2 3 4 1 2 ...
#$ x2: Factor w/ 4 levels "a","b","c","d": 1 2 3 4 1 2 3 4 1 2 ...

Checking class of all columns in data.frame

You are looking for lapply(diamonds, class)

Also apply still worked , but the result is not right, it will return all type to character. look into the link

apply works on arrays/matrices, not data.frames.
when you using it in data.frame it will convert to matrix.

Applying a function to a dataframe does not work

TL;DR:

What you are looking for is .applymap()

Details:

Your method is actually written well and can be used in .apply() as-is, for a pandas.Series object, but I assume that if you are experiencing issues, it is due to the fact that you are probably using it for a pandas.DataFrame, against multiple columns.
In such a case, the argument passed to num_repair is actually of type pandas.Series, which num_repair is not really meant to support.
I can only assume, since the code that uses num_repair isn't given. Consider adding it for the completeness of the question.

If so, you can use it as follows:

df = pd.DataFrame([
['1M', '1B', '1TR'],
['22M', '22B', '22TR'],
], columns=[1990, 1991, 1992])
df.applymap(num_repair)

output:


1990 1991 1992
0 1000000 1000000000 1000000000000
1 22000000 22000000000 22000000000000

Side Note

If you want to apply it to all columns except the country, since the name may contain B / TR / M - you can do the following:

df = pd.DataFrame([
['countryM', '1M', '1B', '1TR'],
['countryB', '22M', '22B', '22TR'],
], columns=['country', 1990, 1991, 1992])
df.loc[:, df.columns.drop('country')] = df.loc[:, df.columns.drop('country')].applymap(num_repair)
df

output:

    country     1990        1991        1992
0 countryM 1000000 1000000000 1000000000000
1 countryB 22000000 22000000000 22000000000000

Why do columns in data frames change class when subsetted versus apply?

You can use lapply or sapply function to know your class of variables, to my understanding apply goes through column element wise so each element is a character so the output shows as as character, where as lapply and sapply functions works on variables so it gives class of variables either its as character or factor

lapply(df,class)
$a
[1] "factor"

$b
[1] "factor"

sapply(df,class)
a b
"factor" "factor"

Why is.factor() used in apply() and sapply() returns different values?

From the reference of apply:

Returns a vector or array or list of values obtained by applying a
function to margins of an array or matrix.

Therefore, it converts your input object to a matrix (array) first which must have the same atomic data type. This means that your data get coerced to character, because factor is not an atomic vector type.

> as.matrix(X)
X1 X2
[1,] "1" "f1"
[2,] "2" "f2"
[3,] "3" "f3"
[4,] "4" "f4"

Checking data type for each variable in a R data frame

apply changes the data to matrix first and since matrix can hold data of only one type if the dataframe has mixed class (numeric, character) it changes the numeric columns to character values thus returning FALSE for is.numeric.

Here's an example to demonstrate what you are observing.

DF <- data.frame(a = 1:5, b = letters[1:5])
apply(DF, 2, is.numeric)

# a b
#FALSE FALSE

sapply(DF, is.numeric)

# a b
# TRUE FALSE

In contrast, if all the columns of dataframe is numeric apply will return TRUE.

DF <- data.frame(a = 1:5, b = 1:5)
apply(DF, 2, is.numeric)

# a b
#TRUE TRUE

How to find out the classes of a data.frame

apply doesn't work for you because, as in the docs:

 If ‘X’ is not an array but an object of a class with a non-null
‘dim’ value (such as a data frame), ‘apply’ attempts to coerce it
to an array via ‘as.matrix’ if it is two-dimensional (e.g., a data
frame) or via ‘as.array’.

so your data frame becomes a matrix with the column classes set to the simplest possible class that can represent your columns - in this case a character matrix:

> as.matrix(Example)
Col1 Col2 Col3
[1,] " 2" "Hello" " TRUE"
[2,] " 5" "I am a" "FALSE"
[3,] "10" "Factor" " TRUE"

Use sapply

> sapply(Example,class)
Col1 Col2 Col3
"numeric" "factor" "logical"


Related Topics



Leave a reply



Submit