Avoid Rbind()/Cbind() Conversion from Numeric to Factor

Avoid rbind()/cbind() conversion from numeric to factor

You can use rbind.data.frame and cbind.data.frame instead of rbind and cbind.

cbind converting factor to numeric

We are cbinding vectors and the output will be a matrix. The matrix can hold only a single class. So, if there is any vector that is non-numeric, it will convert the whole matrix to 'character' and as the first column is already a factor, we get the numeric levels of that factor. Better would be to use data.frame

data.frame(Event=df2$EVTYPE,Total = df2$TOTAL_INJURIES,Severity="INJURE")

Or we can use bind_cols or data_frame from dplyr

Using cbind causes wrong interpretation of numeric variable

The following would do the conversion:

cntrydata$CPI <- as.numeric(as.character(cntrydata$CPI))

If you were to construct the data frame as follows, you wouldn't have the issue and you'd also get the column names:

> cntrydata <- data.frame(cntry=c('BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE', 
+ 'ES', 'FI', 'FR', 'GB', 'GR', 'HR', 'HU', 'IE',
+ 'IL', 'LT', 'NL', 'NO', 'PL', 'PT', 'RU', 'SE',
+ 'SI', 'SK', 'UA'), mode=c('C', 'P', 'C', 'P', 'P', 'C',
+ 'C', 'C', 'C', 'C', 'C', 'C', 'P', 'P', 'P', 'C',
+ 'P', 'P', 'C', 'C', 'P', 'C', 'P', 'C', 'P', 'P', 'P'),
+ CPI=c(7.1, 3.6, 8.7, 6.3, 4.6, 7.9, 9.3, 6.5,
+ 6.1, 9.1, 6.8, 7.6, 3.5, 4.1, 4.7, 8, 6.1, 5, 8.8,
+ 8.6, 5.3, 6, 2.1, 9.2, 6.4, 4.3, 2.4))

R Dataframe Factor conversion to numeric issue

I re-ran your script by adding stringsAsFactors=F to 3 places and it seems to be working fine now:

fgdp <- read.csv("fgdp.csv",skip = 4, header = T, stringsAsFactors=F)
fed <- read.csv("fed.csv" ,header = T, stringsAsFactors=F)

dt <- merge(fgdp, fed, by.x ="CountryCode", by.y = "CountryCode", all = TRUE, stringsAsFactors=F)

Let me know if it worked for you

S3 dispatching of `rbind` and `cbind`

attributes("abc")
#NULL

A character vector doesn't have a class attribute. I don't think a method can be dispatched by rbind for the implicit classes.

Meaning of cbind()

Usually when you run a logistic regression, your outcome variable (on the left of the regression formula) is a simple TRUE/FALSE (or 1/0 or success/failure) column that indicates absence or presence of the characteristic you are trying to model.

However, we can sometimes have the situation where we have a column of counts of successes and another column of counts of failures.

For example, suppose I get 8 men to take 10 shots at a basketball hoop and record their scores. I also measure their height because I want to know whether this predicts their accuracy.

My data might look something like this:

baskets <- data.frame(height = c(1.5, 1.95, 1.8, 1.76, 1.52, 1.91, 1.66, 1.68),
shots_on_target = c(4, 9, 6, 8, 3, 9, 5, 5))

baskets
#> height shots_on_target
#> 1 1.50 4
#> 2 1.95 9
#> 3 1.80 6
#> 4 1.76 8
#> 5 1.52 3
#> 6 1.91 9
#> 7 1.66 5
#> 8 1.68 5

If I want to run a logistic regression on these data, I need to pass a column of successes and a column of failures as the outcome variable. Fortunately, glm allows us to do just that. All we need to do is bind the columns together with cbind - this will convert the success / failures columns into a single 2-column matrix.

Of course, I don't have a failures column, but since I know that each person had 10 shots, I can easily create it by doing 10 - shots_on_target

Therefore, my model can be created like so:

model <- glm(cbind(shots_on_target, 10 - shots_on_target) ~ height, 
data = baskets, family = binomial)

summary(model)
#>
#> Call:
#> glm(formula = cbind(shots_on_target, 10 - shots_on_target) ~
#> height, family = binomial, data = baskets)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.94529 -0.31349 0.00671 0.52595 0.80363
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -10.094 3.038 -3.323 0.000892 ***
#> height 6.182 1.788 3.457 0.000546 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 16.944 on 7 degrees of freedom
#> Residual deviance: 2.564 on 6 degrees of freedom
#> AIC: 26.534
#>
#> Number of Fisher Scoring iterations: 4

and we can see that height was positively predictive of the number of shots on target.

In your example, the format for the outcome variable is similar to my example. However, it doesn't make a lot of sense for the numbers to be normalized to between 0 and 1. The two input columns should really be integers for this approach to make sense.

Created on 2021-11-04 by the reprex package (v2.0.0)

R: numeric vector becoming non-numeric after cbind of dates

The reason is that cbind returns a matrix, and a matrix can only hold one data type. You could use a data.frame instead:

n <- 1:10
b <- LETTERS[1:10]
m <- cbind(n,b)
str(m)
chr [1:10, 1:2] "1" "2" "3" "4" "5" "6" "7" "8" "9" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "n" "b"

d <- data.frame(n,b)
str(d)
'data.frame': 10 obs. of 2 variables:
$ n: int 1 2 3 4 5 6 7 8 9 10
$ b: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10

Ensure column is numeric /conversion does not work

Your question is a duplicate, however, here the short version of an answer based on the detailed answer to this question How to convert a data frame column to numeric type?:

df <-  data.frame(X = c("1","2","3")
,Y =c("3","4","5")
)

sum(df$X) # you`ll get an error
class(df$X)

df <- transform(df, X = as.numeric(df$X))

sum(df$X) # no more error, due to conversion to numeric
class(df$X)

#Update
#after discussion in chat the following lines helped

#convert all columns of data.frame to certain type, here numeric
df[] <- lapply(df, as.numeric)

#convert an individual column, to a certain type, here numeric
#check ?transform for accepted types for conversion
#df would be Coefficients in your example,
#column name would be Serial_Number
df$columnname <- transform(df, columnname = as.numeric(df$columnname))


Related Topics



Leave a reply



Submit