R: Assign Variable Labels of Data Frame Columns

R: Assign variable labels of data frame columns

You can do this by creating a list from the named vector of var.labels and assigning that to the label values. I've used match to ensure that values of var.labels are assigned to their corresponding column in data even if the order of var.labels is different from the order of the data columns.

library(Hmisc)

var.labels = c(age="Age in Years", sex="Sex of the participant")

label(data) = as.list(var.labels[match(names(data), names(var.labels))])

label(data)
age sex
"Age in Years" "Sex of the participant"

Original Answer

My original answer used lapply, which isn't actually necessary. Here's the original answer for archival purposes:

You can assign the labels using lapply:

label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])

lapply applies a function to each element of a list or vector. In this case the function is applied to each value of names(data) and it picks out the label value from var.labels that corresponds to the current value of names(data).

Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply in different situations and see how it behaves.

How to use variable labels in an R data frame

Unfortunately the labels are not supported with basic indexing operations. The closest basic subset strategy most similar to what you have is

table(heights[, label(heights)=="Heights in feet"])

If this a common operation, you could redefine some operator to overload that type of thing for a data.frame. For example

`%%.data.frame` <- function(x, lbl) {
x[,label(x)==lbl]
}

table(heights%%"Heights in feet")

You could even make an assignment version

`%%<-` <- function(x, ...)  UseMethod("%%<-")
`%%<-.data.frame` <- function(x, lbl, value) {
x[,label(x)==lbl] <- value
x
}
heights%%"Heights in feet" <- heights%%"Heights in feet"+1

Of course this is very non-standard so I probably wouldn't recommend, but just pointing out the possibility.

How to add variable labels from one dataframe into another in R?

It really depends on when you want to use your variable "labels". While doing your data analysis, you definitely want to keep your short, concise variable names, otherwise you end up in a scenario of

lm(Sex of Participant ~ `Year of Participation`, data=data)

which is not valid syntax, and a heck of a bother to type again and again and agian (whops, typos!).

And when you've finished your analysis, your boss asks you to rename the age "label" to "Participant age", and there goes the analysis until you've searched and replaced every occurrence of the previous variable name.

So, the case should be clear for keeping concise variable names during coding (and you are not arguing against this in your question).

I am guessing you want variable labels for presentation. How to apply variable labels depends entirely on how you are presenting your data. I'll give a few examples.

Output to console:

> data
age sex year
1 12 1 1998
2 14 0 1997
3 16 1 1994

In this case I would store the labels in a named vector, which also defines the order of the columns. In this case we can

labels <- c(age='Age of participant', sex="Sex of Participant", year="Year of Participation")
present <- data[,names(labels)]
colnames(present) <- labels
> present
Age of participant Sex of Participant Year of Participation
1 12 1 1998
2 14 0 1997
3 16 1 1994

Plotting data:

plot(data[,c('age','year'])

Want to print proper labels? Use xlab and ylab:

plot(data[,c('age','year'], xlab='Age of participant', ylab='Year of participation')

Plotting data using ggplot2:

Again, the axis labels are polishing and are applied separatly

ggplot(data, aes(x=age, y=year)) + geom_point() + labs(x='Age of participant', y='Year of participation')

And if you wanted to make a really small plot, perhaps you would scoot in a newline (\n) to break the label into two lines.

Formatted tables using xtable:

This is actually the same approach as with "output to console".

Conclusion:

I hope I have convinced you why this is not a trivial answer, that variable labels "are not a thing" in R, because their application differs widely.

Although the renaming example supports the case for having labels. There is however not a structure for containing this meta data throughout the R analysis, as many functions from hoards of packages routinely strips of input data.frames of their attributes.

You are more than welcome to ask a new question here on Stackoverflow when you have a specific use case in mind for displaying labels for variables.

How to label values in a column based on a dataframe dictionary

You can try this, though there are likely more elegant solutions:

df <- data.frame(col1 = c(33, 924, 33, 12, 924))
dic <- data.frame(col1 = c(12, 33, 924),
col2 = c("London","Paris","Singapore"))

library(labelled)
ct <- 1
for(i in dic$col1){
val_label(df, i) <- as.character(dic[ct,2])
ct <- ct+1
}
str(df)
# > str(df)
# 'data.frame': 5 obs. of 1 variable:
# $ col1: dbl+lbl [1:5] 33, 924, 33, 12, 924
# ..@ labels: Named num 12 33 924
# .. ..- attr(*, "names")= chr [1:3] "London" "Paris" "Singapore"
# >

How to label colnames of a dataframe directly with a vector

I now found the solution with the sjlabelled package:

library(sjlabelled)

df <- data.frame(matrix(0, ncol = 30, nrow = 2))
label_colname <- sprintf("label_colname[%d]",seq(1:30))

df <- set_label(df, label = label_colname) # set_label from sjlabelled package

Sample Image



Related Topics



Leave a reply



Submit