Avoid rbind()/cbind() conversion from numeric to factor
You can use rbind.data.frame
and cbind.data.frame
instead of rbind
and cbind
.
cbind converting factor to numeric
We are cbind
ing vectors and the output will be a matrix
. The matrix
can hold only a single class
. So, if there is any vector that is non-numeric, it will convert the whole matrix to 'character' and as the first column is already a factor
, we get the numeric levels of that factor. Better would be to use data.frame
data.frame(Event=df2$EVTYPE,Total = df2$TOTAL_INJURIES,Severity="INJURE")
Or we can use bind_cols
or data_frame
from dplyr
Using cbind causes wrong interpretation of numeric variable
The following would do the conversion:
cntrydata$CPI <- as.numeric(as.character(cntrydata$CPI))
If you were to construct the data frame as follows, you wouldn't have the issue and you'd also get the column names:
> cntrydata <- data.frame(cntry=c('BE', 'BG', 'CH', 'CY', 'CZ', 'DE', 'DK', 'EE',
+ 'ES', 'FI', 'FR', 'GB', 'GR', 'HR', 'HU', 'IE',
+ 'IL', 'LT', 'NL', 'NO', 'PL', 'PT', 'RU', 'SE',
+ 'SI', 'SK', 'UA'), mode=c('C', 'P', 'C', 'P', 'P', 'C',
+ 'C', 'C', 'C', 'C', 'C', 'C', 'P', 'P', 'P', 'C',
+ 'P', 'P', 'C', 'C', 'P', 'C', 'P', 'C', 'P', 'P', 'P'),
+ CPI=c(7.1, 3.6, 8.7, 6.3, 4.6, 7.9, 9.3, 6.5,
+ 6.1, 9.1, 6.8, 7.6, 3.5, 4.1, 4.7, 8, 6.1, 5, 8.8,
+ 8.6, 5.3, 6, 2.1, 9.2, 6.4, 4.3, 2.4))
R Dataframe Factor conversion to numeric issue
I re-ran your script by adding stringsAsFactors=F to 3 places and it seems to be working fine now:
fgdp <- read.csv("fgdp.csv",skip = 4, header = T, stringsAsFactors=F)
fed <- read.csv("fed.csv" ,header = T, stringsAsFactors=F)
dt <- merge(fgdp, fed, by.x ="CountryCode", by.y = "CountryCode", all = TRUE, stringsAsFactors=F)
Let me know if it worked for you
S3 dispatching of `rbind` and `cbind`
attributes("abc")
#NULL
A character
vector doesn't have a class attribute. I don't think a method can be dispatched by rbind
for the implicit classes.
Meaning of cbind()
Usually when you run a logistic regression, your outcome variable (on the left of the regression formula) is a simple TRUE/FALSE (or 1/0 or success/failure) column that indicates absence or presence of the characteristic you are trying to model.
However, we can sometimes have the situation where we have a column of counts of successes and another column of counts of failures.
For example, suppose I get 8 men to take 10 shots at a basketball hoop and record their scores. I also measure their height because I want to know whether this predicts their accuracy.
My data might look something like this:
baskets <- data.frame(height = c(1.5, 1.95, 1.8, 1.76, 1.52, 1.91, 1.66, 1.68),
shots_on_target = c(4, 9, 6, 8, 3, 9, 5, 5))
baskets
#> height shots_on_target
#> 1 1.50 4
#> 2 1.95 9
#> 3 1.80 6
#> 4 1.76 8
#> 5 1.52 3
#> 6 1.91 9
#> 7 1.66 5
#> 8 1.68 5
If I want to run a logistic regression on these data, I need to pass a column of successes and a column of failures as the outcome variable. Fortunately, glm
allows us to do just that. All we need to do is bind the columns together with cbind
- this will convert the success / failures columns into a single 2-column matrix.
Of course, I don't have a failures column, but since I know that each person had 10 shots, I can easily create it by doing 10 - shots_on_target
Therefore, my model can be created like so:
model <- glm(cbind(shots_on_target, 10 - shots_on_target) ~ height,
data = baskets, family = binomial)
summary(model)
#>
#> Call:
#> glm(formula = cbind(shots_on_target, 10 - shots_on_target) ~
#> height, family = binomial, data = baskets)
#>
#> Deviance Residuals:
#> Min 1Q Median 3Q Max
#> -0.94529 -0.31349 0.00671 0.52595 0.80363
#>
#> Coefficients:
#> Estimate Std. Error z value Pr(>|z|)
#> (Intercept) -10.094 3.038 -3.323 0.000892 ***
#> height 6.182 1.788 3.457 0.000546 ***
#> ---
#> Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
#>
#> (Dispersion parameter for binomial family taken to be 1)
#>
#> Null deviance: 16.944 on 7 degrees of freedom
#> Residual deviance: 2.564 on 6 degrees of freedom
#> AIC: 26.534
#>
#> Number of Fisher Scoring iterations: 4
and we can see that height was positively predictive of the number of shots on target.
In your example, the format for the outcome variable is similar to my example. However, it doesn't make a lot of sense for the numbers to be normalized to between 0 and 1. The two input columns should really be integers for this approach to make sense.
Created on 2021-11-04 by the reprex package (v2.0.0)
R: numeric vector becoming non-numeric after cbind of dates
The reason is that cbind
returns a matrix, and a matrix can only hold one data type. You could use a data.frame
instead:
n <- 1:10
b <- LETTERS[1:10]
m <- cbind(n,b)
str(m)
chr [1:10, 1:2] "1" "2" "3" "4" "5" "6" "7" "8" "9" ...
- attr(*, "dimnames")=List of 2
..$ : NULL
..$ : chr [1:2] "n" "b"
d <- data.frame(n,b)
str(d)
'data.frame': 10 obs. of 2 variables:
$ n: int 1 2 3 4 5 6 7 8 9 10
$ b: Factor w/ 10 levels "A","B","C","D",..: 1 2 3 4 5 6 7 8 9 10
Ensure column is numeric /conversion does not work
Your question is a duplicate, however, here the short version of an answer based on the detailed answer to this question How to convert a data frame column to numeric type?:
df <- data.frame(X = c("1","2","3")
,Y =c("3","4","5")
)
sum(df$X) # you`ll get an error
class(df$X)
df <- transform(df, X = as.numeric(df$X))
sum(df$X) # no more error, due to conversion to numeric
class(df$X)
#Update
#after discussion in chat the following lines helped
#convert all columns of data.frame to certain type, here numeric
df[] <- lapply(df, as.numeric)
#convert an individual column, to a certain type, here numeric
#check ?transform for accepted types for conversion
#df would be Coefficients in your example,
#column name would be Serial_Number
df$columnname <- transform(df, columnname = as.numeric(df$columnname))
Related Topics
Rank a Vector Based on Order and Replace Ties with Their Average
R: How to Get the Last Element from Each Group
Hashtag Extract Function in R Programming
Dplyr::Select One Column and Output as Vector
Predicting Lda Topics for New Data
Putting X-Axis at Top of Ggplot2 Chart
How to Prevent Exposure of My Password When Using Rgoogledocs
Reading Objects from Shiny Output Object Not Allowed
Differences Between %.% (Dplyr) and %>% (Magrittr)
R Partial Reshape Data from Long to Wide
Speeding Up Julia's Poorly Written R Examples
Put Multiple Data Frames into List (Smart Way)
Rmarkdown: Pandoc: PDFlatex Not Found