Why doesn't + operate on characters in R?
@ Dirk: For once, you're not quite right. It's not the parser.
One can write methods in R for "+" -- help("+") goes to "Arithmetic operators" and mentions
that these are generic and you can write methods for them ... and of course many package writers do, e.g., we do for the 'Matrix' package, and I also do for the "Rmpfr" package, e.g.
But Dirk is also right (of course!) that you cannot do it in R currently,
by just defining a method for "+.character".
About three years ago, I had started a thread on R-devel (the R mailing list on R development; very much recommended if you are interested in these topics; you can also access through Gmane if you don't want to subscribe) :r-devel archived msg
It came to an interesting discussion with quite a few pros and cons,
notably John Chambers ("the father of S and hence R") pretty strongly opposing to use "+" for an operation that is not commutative,
and also r-devel archived msg2 (by another R-core member), supporting the view that we (R Core) should not adopt / support the idea; and if people **really* wanted it, they could define
%+% for that.
R: factor to character does not work
By default data.frame
has argument stringsAsFactors = TRUE
. When you call data.frame(affrete)
it converts characters to factors. You can either:
- Call
data.frame(affrete, stringsAsFactors = FALSE)
instead - Set this behaviour off permanently for your session with
options(stringsAsFactors = FALSE)
- Fix after the fact once it's already in the list with
list_attribute$affrete$affrete <- as.character(list_attribute$affrete$affrete)
- Use
tbls
from thetidyverse
, so calltibble(affrete)
instead. These never convert characters, among other benefits.
data frame with column character type doesn't work with rbind
Make sure you are using rbind
to join data.frames with same column names
df = data.frame(x = character(0), stringsAsFactors = FALSE)
df = rbind(df, data.frame(x = c('foo')))
df = rbind(df, data.frame(x = c('bar')))
df
# x
#1 foo
#2 bar
Why does mutate doesn't work if we put ifelse of results of types both character and numeric?
From ifelse
documentation -
ifelse(test, yes, no)
ifelse returns a vector of the same length and attributes (including
dimensions and "class") as test and data values from the values of yes
or no. The mode of the answer will be coerced from logical to
accommodate first any values taken from yes and then any values taken
from no
Basically you can't mix characters and numbers for yes/no values. It is not a good idea mix characters and numbers in the same variable anyways. Consider using NA_real_
instead of --
. If you must do it your way then you can try using as.character(mean(c(var1,var2)))
but now your means are returned as characters.
df %>%
group_by(level) %>%
mutate(result = ifelse(is.na(var1) | is.na(var2), "--", as.character(mean(c(var1,var2)))))
# A tibble: 14 x 4
# Groups: level [8]
level var1 var2 result
<dbl> <dbl> <dbl> <chr>
1 1 1 2 1.5
2 2 1 NA --
3 3 2 1 1.5
4 4 3 2 4.25
5 5 4 3 5
6 6 5 4 6.25
7 7 6 5 6.25
8 8 7 6 7
9 8 8 7 7
10 6 8 8 6.25
11 7 6 8 6.25
12 5 7 6 5
13 4 5 7 4.25
14 2 4 5 NA
Note -
You can use write.csv(df, "report.csv", na = "--")
if you only want to replace NA
with "--"
in your report.
gsub() not working if I reference a column using a character vector?
gsub
is being given a vector of strings, and it does what it knows: works on the strings. It doesn't know that they should be an indirect reference. (Nothing will know that it should be indirect.)
You have two options:
The canonical way in
data.table
for this is likely to use.SDcols
.preferences[, (cols) := lapply(.SD, gsub, pattern = "UN1", replacement = "A"), .SDcols = cols]
preferences
# Pref_1
# <char>
# 1: A
# 2: Food and Agriculture Organization (F...
# 3: United Nations Educational, Scientif...
# 4: United Nations Development Programme...
# 5: Commission on Narcotic Drugs (CND)
# 6: Commission on Narcotic Drugs (CND)
# 7: Human Rights Council (HRC)
# 8: A
# 9: Human Rights Council (HRC)
# 10: AThis does two things: (i) the use of
.SDcols
for iterating over a dynamic set of columns is preferred and faster, and allows programmatic determination of those columns (what you need); (ii) usinglapply
allows you to do this to one or more columns. If you know you'll always do just one column, this still works well with very little overhead.You can
get
/mget
the data. This is the way to tell something to grab the contents of a variable identified in a string vector.If you know that you will always have exactly one column, then you can use
get
:preferences[, (cols) := gsub(get(cols), pattern = "UN1", replacement = "A")]
If there is even a chance that you'll have more than one, I strongly recommend
mget
. (Even if you think you'll always have one, this is still safe.)preferences[, (cols) := lapply(mget(cols), gsub, pattern = "UN1", replacement = "A")]
Data
preferences <- setDT(structure(list(Pref_1 = c("UN1", "Food and Agriculture Organization (FAO)", "United Nations Educational, Scientific and Cultural Organization (UNESCO)", "United Nations Development Programme (UNDP)", "Commission on Narcotic Drugs (CND)", "Commission on Narcotic Drugs (CND)", "Human Rights Council (HRC)", "UN1", "Human Rights Council (HRC)", "UN1")), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
cols <- "Pref_1"
summaryBy doesn't work properly
I guess the problem is probably caused by R that coding characters as factors when creating data frames. See the following comparisons.
temp <- data.frame(Comment=c("fa", "fsa", "", "", "fasdf", "rew"),
Prob=c(0.40768666, 0.61956024, NA, 0.12916298, 0.09724928, 0.47395962),
stringsAsFactors = TRUE)
c_fun <- function(x){c(example=head(x,n=1),mcnt=sum(as.character(x)==""))}
summaryBy(Comment~., data= temp, FUN= c_fun)
# Comment.example Comment.mcnt
# 1 2 0
temp <- data.frame(Comment=c("fa", "fsa", "", "", "fasdf", "rew"),
Prob=c(0.40768666, 0.61956024, NA, 0.12916298, 0.09724928, 0.47395962),
stringsAsFactors = FALSE)
summaryBy(Comment~., data= temp, FUN= c_fun)
# Comment.example Comment.mcnt
# 1 fa 2
RStudio can't deal with file names with unicode characters
This is a bug in the current release of RStudio (2021.09.2+382). We're currently working on getting a patch release out, but in the interim, you can download the previous release from:
https://s3.amazonaws.com/rstudio-ide-build/desktop/windows/RStudio-2021.09.1-372.exe
Related Topics
R Reshape2 'Aggregation Function Missing: Defaulting to Length'
Scale Back Linear Regression Coefficients in R from Scaled and Centered Data
How to Run a Function Every Second
R Equivalent of Stata Local or Global MACros
Efficiently Counting Non-Na Elements in Data.Table
How to Flatten R Data Frame That Contains Lists
Creating Igraph with Isolated Nodes
Changing the Appearance of Facet Labels Size
Rcurl: Url.Exists Returns False When Url Does Exist
Ggplot Inserting Space Before Degree Symbol on Axis Label
How to Round a Date to the Quarter Start/End
As.Posixct Gives an Unexpected Timezone
Predict.Svm Does Not Predict New Data
Efficiently Transform Multiple Columns of a Data Frame