What is the difference between colnames(x[1]) - name and colnames(x)[1] - name ?
The too much information answer:
If you look at what each of the options "de-sugars" to:
# 1.
`[<-`(x, 1, value=`colnames<-`(x[1], 'Name'))
# 2.
`colnames<-`(x, `[<-`(colnames(x), 1, 'Name'))
The first option makes a new data.frame from just the first column, renames that column (successfully), and then tries to assign that data.frame back over the first column. [<-.data.frame
will propagate the values, however will not rename existing columns based on the names of value
.
The second option gets the colnames of the data.frame, updates the first value, and creates a new data.frame with the updated names.
(Answer to @Peng Peng's question here because I can't figure out how to get backtick quoting to work in a comment...)
The backtick is to quote the variable name. Consider the difference here:
x<-1
`x<-`<-1
The first assigns 1 to a variable called x, but the second assigns to a variable called x<-
. These unusal variable names are actually used by the <-
primitive function - you are allowed arbitrary function calls on the lhs of an assignment, and a function with <-
appended to the name specifies how to perform the update (similar to setf
in lisp).
Row names & column names in R
As Oscar Wilde said
Consistency is the last refuge of the
unimaginative.
R is more of an evolved rather than designed language, so these things happen. names()
and colnames()
work on a data.frame
but names()
does not work on a matrix:
R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>
How to calculate difference between columns in different data frames with similar pattern in column names?
If the difference between the two names are just the last character, then we could use adist
a = which(adist(names(DF1),names(DF2))==1,T)
result = DF1[,a[,1]]-DF2[,a[,2]]
setNames(result,sub("_[A-Z]$",'',names(result)))
w_H_11 w_H_16 w_13_12
1 2 0 2
2 2 4 10
with the updated table, it seems we delete all the letters to the end thus you could do:
a = which(do.call(adist,lapply(list(names(DF1),names(DF2)),sub,pat="_[^_]*$",rep=""))==0,T)
and the rest remains
What does 'col.names' do in 'as.data.frame' in R?
If you want to avoid assigning the column names after creating the dataframe, you can utilize the dnn
parameter in the table
function to specify your "name" column, and the responseName
parameter in the as.data.frame
function to specify the "freq" column.
x <- c('a','b','c','a')
x_df <- as.data.frame(table(x, dnn = list("name")), responseName = "freq")
R Replacing colnames based on matching names in another dataframe
We can use match
to get the index
i1 <- match(colnames(df), namelist$Code)
i2 <- !is.na(i1) # to take care of non matches which are NA
names(df)[i2] <- namelist$Name[i1[i2]]
names(df)
#[1] "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
#[10] "S19" "NRROX3720Q" "AJDIO5627R" "PNGQI9045F" "PMRKH3945W" "AWTUS8801K" "FAUSS0775K" "RHMDT7354P" "EHFXN5677T"
#[19] "DEXAD5460Z" "XNPJU6465R" "ISLKV8962F" "ZVAAT4099D" "MWCLD5013G" "MSSCG1315D" "NKJBC5303V" "EDHHR9300M" "CVWHP7658I"
#[28] "BPUSL4348S" "LPEWZ1407A" "QACRV3987M" "XMHYQ8544N" "UJGRX9778J" "KPAYY3203M" "JTETK9509P" "VYNYF6624P" "RDDZD3099N"
#[37] "SHUES3288G" "CGFKB5625F" "WTUEX0452E" "BSDUR3721G" "BZMND9193I" "F51" "F52" "F53" "F54"
#[46] "F55" "F56" "F57" "F58" "F59" "F60" "F61" "F62" "F63"
#[55] "F64" "F65" "F66" "F67" "F68" "F69"
i.e. if there are no match, the column name remain as such
Difference between `names(df[1]) - ` and `names(df)[1] - `
What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:
df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7
df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12
df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent
Which produces:
# a b
# 1 10 5
# 2 11 6
# 3 12 7
Notice how the values changed for df
, but not the names. Basically the replacement operator `[<-`
only replaces the values. This is why the name was not updated. I believe this explains all the issues.
In the scenario:
names(df[2]) <- "x"
You can think of the assignment as follows (this is a simplification, see end of post for more detail):
tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7
names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7
df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7
The last step of which is an assignment with `[<-`
, which doesn't respect the names attribute of the RHS.
But in the scenario:
names(df)[2] <- "x"
you can think of the assignment as (again, a simplification):
tmp <- names(df)
# [1] "a" "b"
tmp[2] <- "x"
# [1] "a" "x"
names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7
Notice how we directly assign to names
, instead of assigning to df
which ignores attributes.
df[2] <- 2
works because we are assigning directly to the values, not the attributes, so there are no problems here.
EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`
, etc., for clarity):
Version 1 names(df[2]) <- "x"
translates to:
df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )
Version 2 names(df)[2] <- "x"
translates to:
df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )
Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):
right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2
Related Topics
Detect Non Ascii Characters in a String
Plots with Good Resolution for Printing and Screen Display
Calculate Mean Across Rows with Na Values in R
Replace All Na with False in Selected Columns in R
How to Order Bars in Faceted Ggplot2 Bar Chart
Hollow Histogram or Binning for Geom_Step
Using Rcpp Functions Inside of R's Par*Apply Functions from the Parallel Package
Why Is Expand.Grid Faster Than Data.Table 's Cj
How to Append Data from a Data Frame in R to an Excel Sheet That Already Exists
How to Generate Ascii "Graphical Output" from R
Run R Script from .Bat (Batch File)
R: 'Split' Preserving Natural Order of Factors
Arithmetic Operations on R Factors
Creating R Package, Warning: Package '---' Was Built Under R Version 3.1.2