Differencebetween Names and Colnames

What is the difference between colnames(x[1]) - name and colnames(x)[1] - name ?

The too much information answer:
If you look at what each of the options "de-sugars" to:

# 1.
`[<-`(x, 1, value=`colnames<-`(x[1], 'Name'))
# 2.
`colnames<-`(x, `[<-`(colnames(x), 1, 'Name'))

The first option makes a new data.frame from just the first column, renames that column (successfully), and then tries to assign that data.frame back over the first column. [<-.data.frame will propagate the values, however will not rename existing columns based on the names of value.

The second option gets the colnames of the data.frame, updates the first value, and creates a new data.frame with the updated names.


(Answer to @Peng Peng's question here because I can't figure out how to get backtick quoting to work in a comment...)

The backtick is to quote the variable name. Consider the difference here:

x<-1
`x<-`<-1

The first assigns 1 to a variable called x, but the second assigns to a variable called x<-. These unusal variable names are actually used by the <- primitive function - you are allowed arbitrary function calls on the lhs of an assignment, and a function with <- appended to the name specifies how to perform the update (similar to setf in lisp).

Row names & column names in R

As Oscar Wilde said

Consistency is the last refuge of the
unimaginative.

R is more of an evolved rather than designed language, so these things happen. names() and colnames() work on a data.frame but names() does not work on a matrix:

R> DF <- data.frame(foo=1:3, bar=LETTERS[1:3])
R> names(DF)
[1] "foo" "bar"
R> colnames(DF)
[1] "foo" "bar"
R> M <- matrix(1:9, ncol=3, dimnames=list(1:3, c("alpha","beta","gamma")))
R> names(M)
NULL
R> colnames(M)
[1] "alpha" "beta" "gamma"
R>

How to calculate difference between columns in different data frames with similar pattern in column names?

If the difference between the two names are just the last character, then we could use adist

 a = which(adist(names(DF1),names(DF2))==1,T) 
result = DF1[,a[,1]]-DF2[,a[,2]]
setNames(result,sub("_[A-Z]$",'',names(result)))
w_H_11 w_H_16 w_13_12
1 2 0 2
2 2 4 10

with the updated table, it seems we delete all the letters to the end thus you could do:

a = which(do.call(adist,lapply(list(names(DF1),names(DF2)),sub,pat="_[^_]*$",rep=""))==0,T) and the rest remains

What does 'col.names' do in 'as.data.frame' in R?

If you want to avoid assigning the column names after creating the dataframe, you can utilize the dnn parameter in the table function to specify your "name" column, and the responseName parameter in the as.data.frame function to specify the "freq" column.

x <- c('a','b','c','a')
x_df <- as.data.frame(table(x, dnn = list("name")), responseName = "freq")

R Replacing colnames based on matching names in another dataframe

We can use match to get the index

i1 <- match(colnames(df), namelist$Code)
i2 <- !is.na(i1) # to take care of non matches which are NA
names(df)[i2] <- namelist$Name[i1[i2]]
names(df)
#[1] "S10" "S11" "S12" "S13" "S14" "S15" "S16" "S17" "S18"
#[10] "S19" "NRROX3720Q" "AJDIO5627R" "PNGQI9045F" "PMRKH3945W" "AWTUS8801K" "FAUSS0775K" "RHMDT7354P" "EHFXN5677T"
#[19] "DEXAD5460Z" "XNPJU6465R" "ISLKV8962F" "ZVAAT4099D" "MWCLD5013G" "MSSCG1315D" "NKJBC5303V" "EDHHR9300M" "CVWHP7658I"
#[28] "BPUSL4348S" "LPEWZ1407A" "QACRV3987M" "XMHYQ8544N" "UJGRX9778J" "KPAYY3203M" "JTETK9509P" "VYNYF6624P" "RDDZD3099N"
#[37] "SHUES3288G" "CGFKB5625F" "WTUEX0452E" "BSDUR3721G" "BZMND9193I" "F51" "F52" "F53" "F54"
#[46] "F55" "F56" "F57" "F58" "F59" "F60" "F61" "F62" "F63"
#[55] "F64" "F65" "F66" "F67" "F68" "F69"

i.e. if there are no match, the column name remain as such

Difference between `names(df[1]) - ` and `names(df)[1] - `

What I think is happening is that replacement into a data frame ignores the attributes of the data frame that is drawn from. I am not 100% sure of this, but the following experiments appear to back it up:

df <- data.frame(a = 1:3, b = 5:7)
# a b
# 1 1 5
# 2 2 6
# 3 3 7

df2 <- data.frame(c = 10:12)
# c
# 1 10
# 2 11
# 3 12

df[1] <- df2[1] # in this case `df[1] <- df2` is equivalent

Which produces:

#    a b
# 1 10 5
# 2 11 6
# 3 12 7

Notice how the values changed for df, but not the names. Basically the replacement operator `[<-` only replaces the values. This is why the name was not updated. I believe this explains all the issues.

In the scenario:

names(df[2]) <- "x"

You can think of the assignment as follows (this is a simplification, see end of post for more detail):

tmp <- df[2]
# b
# 1 5
# 2 6
# 3 7

names(tmp) <- "x"
# x
# 1 5
# 2 6
# 3 7

df[2] <- tmp # `tmp` has "x" for names, but it is ignored!
# a b
# 1 10 5
# 2 11 6
# 3 12 7

The last step of which is an assignment with `[<-`, which doesn't respect the names attribute of the RHS.

But in the scenario:

names(df)[2] <- "x"

you can think of the assignment as (again, a simplification):

tmp <- names(df)
# [1] "a" "b"

tmp[2] <- "x"
# [1] "a" "x"

names(df) <- tmp
# a x
# 1 10 5
# 2 11 6
# 3 12 7

Notice how we directly assign to names, instead of assigning to df which ignores attributes.

df[2] <- 2

works because we are assigning directly to the values, not the attributes, so there are no problems here.


EDIT: based on some commentary from @AriB.Friedman, here is a more elaborate version of what I think is going on (note I'm omitting the S3 dispatch to `[.data.frame`, etc., for clarity):

Version 1 names(df[2]) <- "x" translates to:

df <- `[<-`(
df, 2,
value=`names<-`( # `names<-` here returns a re-named one column data frame
`[`(df, 2),
value="x"
) )

Version 2 names(df)[2] <- "x" translates to:

df <- `names<-`(
df,
`[<-`(
names(df), 2, "x"
) )

Also, turns out this is "documented" in R Inferno Section 8.2.34 (Thanks @Frank):

right <- wrong <- c(a=1, b=2)
names(wrong[1]) <- 'changed'
wrong
# a b
# 1 2
names(right)[1] <- 'changed'
right
# changed b
# 1 2


Related Topics



Leave a reply



Submit