R: Assign variable labels of data frame columns
You can do this by creating a list from the named vector of var.labels
and assigning that to the label
values. I've used match
to ensure that values of var.labels
are assigned to their corresponding column in data
even if the order of var.labels
is different from the order of the data
columns.
library(Hmisc)
var.labels = c(age="Age in Years", sex="Sex of the participant")
label(data) = as.list(var.labels[match(names(data), names(var.labels))])
label(data)
age sex
"Age in Years" "Sex of the participant"
Original Answer
My original answer used lapply
, which isn't actually necessary. Here's the original answer for archival purposes:
You can assign the labels using lapply
:
label(data) = lapply(names(data), function(x) var.labels[match(x, names(var.labels))])
lapply
applies a function to each element of a list or vector. In this case the function is applied to each value of names(data)
and it picks out the label value from var.labels
that corresponds to the current value of names(data)
.
Reading through a few tutorials is a good way to get the general idea, but you'll really get the hang of it if you start using lapply
in different situations and see how it behaves.
How to use variable labels in an R data frame
Unfortunately the labels are not supported with basic indexing operations. The closest basic subset strategy most similar to what you have is
table(heights[, label(heights)=="Heights in feet"])
If this a common operation, you could redefine some operator to overload that type of thing for a data.frame. For example
`%%.data.frame` <- function(x, lbl) {
x[,label(x)==lbl]
}
table(heights%%"Heights in feet")
You could even make an assignment version
`%%<-` <- function(x, ...) UseMethod("%%<-")
`%%<-.data.frame` <- function(x, lbl, value) {
x[,label(x)==lbl] <- value
x
}
heights%%"Heights in feet" <- heights%%"Heights in feet"+1
Of course this is very non-standard so I probably wouldn't recommend, but just pointing out the possibility.
How to add variable labels from one dataframe into another in R?
It really depends on when you want to use your variable "labels". While doing your data analysis, you definitely want to keep your short, concise variable names, otherwise you end up in a scenario of
lm(Sex of Participant ~ `Year of Participation`, data=data)
which is not valid syntax, and a heck of a bother to type again and again and agian (whops, typos!).
And when you've finished your analysis, your boss asks you to rename the age "label" to "Participant age", and there goes the analysis until you've searched and replaced every occurrence of the previous variable name.
So, the case should be clear for keeping concise variable names during coding (and you are not arguing against this in your question).
I am guessing you want variable labels for presentation. How to apply variable labels depends entirely on how you are presenting your data. I'll give a few examples.
Output to console:
> data
age sex year
1 12 1 1998
2 14 0 1997
3 16 1 1994
In this case I would store the labels in a named vector, which also defines the order of the columns. In this case we can
labels <- c(age='Age of participant', sex="Sex of Participant", year="Year of Participation")
present <- data[,names(labels)]
colnames(present) <- labels
> present
Age of participant Sex of Participant Year of Participation
1 12 1 1998
2 14 0 1997
3 16 1 1994
Plotting data:
plot(data[,c('age','year'])
Want to print proper labels? Use xlab
and ylab
:
plot(data[,c('age','year'], xlab='Age of participant', ylab='Year of participation')
Plotting data using ggplot2:
Again, the axis labels are polishing and are applied separatly
ggplot(data, aes(x=age, y=year)) + geom_point() + labs(x='Age of participant', y='Year of participation')
And if you wanted to make a really small plot, perhaps you would scoot in a newline (\n
) to break the label into two lines.
Formatted tables using xtable
:
This is actually the same approach as with "output to console".
Conclusion:
I hope I have convinced you why this is not a trivial answer, that variable labels "are not a thing" in R, because their application differs widely.
Although the renaming example supports the case for having labels. There is however not a structure for containing this meta data throughout the R analysis, as many functions from hoards of packages routinely strips of input data.frames of their attributes.
You are more than welcome to ask a new question here on Stackoverflow when you have a specific use case in mind for displaying labels for variables.
How to label values in a column based on a dataframe dictionary
You can try this, though there are likely more elegant solutions:
df <- data.frame(col1 = c(33, 924, 33, 12, 924))
dic <- data.frame(col1 = c(12, 33, 924),
col2 = c("London","Paris","Singapore"))
library(labelled)
ct <- 1
for(i in dic$col1){
val_label(df, i) <- as.character(dic[ct,2])
ct <- ct+1
}
str(df)
# > str(df)
# 'data.frame': 5 obs. of 1 variable:
# $ col1: dbl+lbl [1:5] 33, 924, 33, 12, 924
# ..@ labels: Named num 12 33 924
# .. ..- attr(*, "names")= chr [1:3] "London" "Paris" "Singapore"
# >
How to label colnames of a dataframe directly with a vector
I now found the solution with the sjlabelled package:
library(sjlabelled)
df <- data.frame(matrix(0, ncol = 30, nrow = 2))
label_colname <- sprintf("label_colname[%d]",seq(1:30))
df <- set_label(df, label = label_colname) # set_label from sjlabelled package
Related Topics
Why Does Merge Result in More Rows Than Original Data
Handling Dates When We Switch to Daylight Savings Time and Back
Why Is Using '<<-' Frowned Upon and How to Avoid It
How to Plot a Stacked and Grouped Bar Chart in Ggplot
Merge by Range in R - Applying Loops
Create Frequency Tables for Multiple Factor Columns in R
Filling Area Under Curve Based on Value
How to Subset Data in R Without Losing Na Rows
Adding New Columns to a Data.Table By-Reference Within a Function Not Always Working
Replace Na Value with the Group Value
Directly Creating Dummy Variable Set in a Sparse Matrix in R
R - How to Get Row & Column Subscripts of Matched Elements from a Distance Matrix
Block-Diagonal Binding of Matrices
Data.Frame Without Ruining Column Names
Finding Out Which Functions Are Called Within a Given Function
Finding the Maximum Value for Each Row Among 3 Columns in R