Why Doesn't "+" Operate on Characters in R

Why doesn't + operate on characters in R?

@ Dirk: For once, you're not quite right. It's not the parser.
One can write methods in R for "+" -- help("+") goes to "Arithmetic operators" and mentions
that these are generic and you can write methods for them ... and of course many package writers do, e.g., we do for the 'Matrix' package, and I also do for the "Rmpfr" package, e.g.
But Dirk is also right (of course!) that you cannot do it in R currently,
by just defining a method for "+.character".

About three years ago, I had started a thread on R-devel (the R mailing list on R development; very much recommended if you are interested in these topics; you can also access through Gmane if you don't want to subscribe) :r-devel archived msg

It came to an interesting discussion with quite a few pros and cons,
notably John Chambers ("the father of S and hence R") pretty strongly opposing to use "+" for an operation that is not commutative,
and also r-devel archived msg2 (by another R-core member), supporting the view that we (R Core) should not adopt / support the idea; and if people **really* wanted it, they could define
%+% for that.

R: factor to character does not work

By default data.frame has argument stringsAsFactors = TRUE. When you call data.frame(affrete) it converts characters to factors. You can either:

  1. Call data.frame(affrete, stringsAsFactors = FALSE) instead
  2. Set this behaviour off permanently for your session with options(stringsAsFactors = FALSE)
  3. Fix after the fact once it's already in the list with list_attribute$affrete$affrete <- as.character(list_attribute$affrete$affrete)
  4. Use tbls from the tidyverse, so call tibble(affrete) instead. These never convert characters, among other benefits.

data frame with column character type doesn't work with rbind

Make sure you are using rbind to join data.frames with same column names

df = data.frame(x = character(0), stringsAsFactors = FALSE)
df = rbind(df, data.frame(x = c('foo')))
df = rbind(df, data.frame(x = c('bar')))
df
# x
#1 foo
#2 bar

Why does mutate doesn't work if we put ifelse of results of types both character and numeric?

From ifelse documentation -

ifelse(test, yes, no)

ifelse returns a vector of the same length and attributes (including
dimensions and "class") as test and data values from the values of yes
or no. The mode of the answer will be coerced from logical to
accommodate first any values taken from yes and then any values taken
from no

Basically you can't mix characters and numbers for yes/no values. It is not a good idea mix characters and numbers in the same variable anyways. Consider using NA_real_ instead of --. If you must do it your way then you can try using as.character(mean(c(var1,var2))) but now your means are returned as characters.

df  %>%
group_by(level) %>%
mutate(result = ifelse(is.na(var1) | is.na(var2), "--", as.character(mean(c(var1,var2)))))

# A tibble: 14 x 4
# Groups: level [8]
level var1 var2 result
<dbl> <dbl> <dbl> <chr>
1 1 1 2 1.5
2 2 1 NA --
3 3 2 1 1.5
4 4 3 2 4.25
5 5 4 3 5
6 6 5 4 6.25
7 7 6 5 6.25
8 8 7 6 7
9 8 8 7 7
10 6 8 8 6.25
11 7 6 8 6.25
12 5 7 6 5
13 4 5 7 4.25
14 2 4 5 NA

Note -

You can use write.csv(df, "report.csv", na = "--") if you only want to replace NA with "--" in your report.

gsub() not working if I reference a column using a character vector?

gsub is being given a vector of strings, and it does what it knows: works on the strings. It doesn't know that they should be an indirect reference. (Nothing will know that it should be indirect.)

You have two options:

  1. The canonical way in data.table for this is likely to use .SDcols.

    preferences[, (cols) := lapply(.SD, gsub, pattern = "UN1", replacement = "A"), .SDcols = cols]
    preferences
    # Pref_1
    # <char>
    # 1: A
    # 2: Food and Agriculture Organization (F...
    # 3: United Nations Educational, Scientif...
    # 4: United Nations Development Programme...
    # 5: Commission on Narcotic Drugs (CND)
    # 6: Commission on Narcotic Drugs (CND)
    # 7: Human Rights Council (HRC)
    # 8: A
    # 9: Human Rights Council (HRC)
    # 10: A

    This does two things: (i) the use of .SDcols for iterating over a dynamic set of columns is preferred and faster, and allows programmatic determination of those columns (what you need); (ii) using lapply allows you to do this to one or more columns. If you know you'll always do just one column, this still works well with very little overhead.

  2. You can get/mget the data. This is the way to tell something to grab the contents of a variable identified in a string vector.

    If you know that you will always have exactly one column, then you can use get:

    preferences[, (cols) := gsub(get(cols), pattern = "UN1", replacement = "A")]

    If there is even a chance that you'll have more than one, I strongly recommend mget. (Even if you think you'll always have one, this is still safe.)

    preferences[, (cols) := lapply(mget(cols), gsub, pattern = "UN1", replacement = "A")]

Data

preferences <- setDT(structure(list(Pref_1 = c("UN1", "Food and Agriculture Organization (FAO)", "United Nations Educational, Scientific and Cultural Organization (UNESCO)", "United Nations Development Programme (UNDP)", "Commission on Narcotic Drugs (CND)", "Commission on Narcotic Drugs (CND)", "Human Rights Council (HRC)", "UN1", "Human Rights Council (HRC)", "UN1")), class = c("data.table", "data.frame"), row.names = c(NA, -10L)))
cols <- "Pref_1"

summaryBy doesn't work properly

I guess the problem is probably caused by R that coding characters as factors when creating data frames. See the following comparisons.

temp <- data.frame(Comment=c("fa", "fsa", "", "", "fasdf", "rew"), 
Prob=c(0.40768666, 0.61956024, NA, 0.12916298, 0.09724928, 0.47395962),
stringsAsFactors = TRUE)

c_fun <- function(x){c(example=head(x,n=1),mcnt=sum(as.character(x)==""))}
summaryBy(Comment~., data= temp, FUN= c_fun)
# Comment.example Comment.mcnt
# 1 2 0

temp <- data.frame(Comment=c("fa", "fsa", "", "", "fasdf", "rew"),
Prob=c(0.40768666, 0.61956024, NA, 0.12916298, 0.09724928, 0.47395962),
stringsAsFactors = FALSE)
summaryBy(Comment~., data= temp, FUN= c_fun)
# Comment.example Comment.mcnt
# 1 fa 2

RStudio can't deal with file names with unicode characters

This is a bug in the current release of RStudio (2021.09.2+382). We're currently working on getting a patch release out, but in the interim, you can download the previous release from:

https://s3.amazonaws.com/rstudio-ide-build/desktop/windows/RStudio-2021.09.1-372.exe



Related Topics



Leave a reply



Submit