Removing attributes of columns in data.frames on multilevel lists in R
This is perhaps too late to answer on this thread, but I wanted to share.
Two solutions :
1. function stripAttributes from merTools package.
to remove the attribute ATT from variable VAR in your data-frame MyData:
attr(MyData$VAR, "ATT") <- NULL
If you want to remove several attributes of all variables :
For (var in colnames(MyData)) {
attr(MyData[,deparse(as.name(var))], "ATT_1") <- NULL
attr(MyData[,deparse(as.name(var))], "ATT_2") <- NULL
}
I hope This Helps,
Regards
How to erase all attributes?
To remove all attributes, how about this
df1[] <- lapply(df1, function(x) { attributes(x) <- NULL; x })
str(df1)
#'data.frame': 3 obs. of 4 variables:
# $ id: int 1 2 3
# $ V1: int 4 5 6
# $ V2: int 7 8 9
# $ V3: int 10 11 12
Behavior of - NULL on lists versus data.frames for removing data
DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.
I've tried a bit of tracing on[<-.data.frame
, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value
(the replacement value argument) is not a list.
Case 1 : value
is not a list
Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :
Note that when the replacement value is an array (including a matrix)
it is not treated as a series of columns (as 'data.frame’ and
‘as.data.frame’ do) but inserted as a single column.
If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value)
. If this is the case, value
is recycled with rep
if necessary and converted to a list. If length(value)==0
, there is no recycling (as it is impossible), and value
is just converted to a list.
If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value)
must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.
The exact test is the following :
(m < n * p && (m == 0L || (n * p)%%m))
Where n
is the number of rows, p
the number of columns, and m
the length of value
. If the condition is FALSE, then value
is converted into an n x p
matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.
If value
is NULL, then the condition is TRUE as m==0
, and the function is stopped.
Note that the problem occurs for every value
of length 0. For example,
cars1[,c("mpg")] <- numeric(0)
works, whereas :
cars1[,c("mpg","disp")] <- numeric(0)
fails in the same way as cars1[,c("mpg","disp")] <- NULL
Case 2 : value
is a list
If value
is a list, then it is used to replace several columns at the same time. For example :
cars1[,c("mpg","disp")] <- list(1,2)
will replace cars1$mpg
with a vector of 1s, and cars1$disp
with a vector of 2s.
There is a sort of "double recycling" which happens here :
- first, the length of the
value
list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done. - second, for each element of the
value
list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.
When the value
in RHS is list(NULL)
, nothing really happens, as recycling is impossible (rep(NULL, 10)
is always NULL
). But the code continues and in the end each column to be replaced is assigned NULL
, ie is removed.
Summary and (sort of) conclusion
data.frame
and list
behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL
fails not because of the NULL
value by itself, but because NULL
is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).
Handling the case of value=NULL
for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL
as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.
Converting nested list to dataframe
You can also use (at least v1.9.3) of rbindlist
in the data.table
package:
library(data.table)
rbindlist(mylist, fill=TRUE)
## Hit Project Year Rating Launch ID Dept Error
## 1: True Blue 2011 4 26 Jan 2012 19 1, 2, 4 NA
## 2: False NA NA NA NA NA NA Record not found
## 3: True Green 2004 8 29 Feb 2004 183 6, 8 NA
Sort (order) data frame rows by multiple columns
You can use the order()
function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order)
code:
R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order()
function:
R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>
rather than using the name of the column (and with()
for easier/more direct access).
How assign value to new variable using multiple relational operators in data with missing values in [r]?
Here are a few pointers. In R, it is common to assign variables to names using the <-
operator. To be fair, I didn't even know you could assign length to a variable, so I learned something new.
A <- seq(1, 6)
length(A) <- 7
B <- seq(2, 4)
length(B) <- 7
m <- cbind(A, B)
The difference between a matrix
and a data.frame
is that a matrix is a vector of numbers with a dim
attribute specifying the dimensions (also true for arrays), whereas a data.frame is a series of lists (along columns) of equal length (the number of rows).
What this means in practice is that data.frames can have anything in different columns, e.g. one might be a character
and another an integer
, whereas matrices can only contain data of the same type.
> attributes(m)
$dim
[1] 7 2
$dimnames
$dimnames[[1]]
NULL
$dimnames[[2]]
[1] "A" "B"
> df <- as.data.frame(m)
> attributes(df)
$names
[1] "A" "B"
$class
[1] "data.frame"
$row.names
[1] 1 2 3 4 5 6 7
> is.list(m)
[1] FALSE
> is.list(df)
[1] TRUE
The if-else statements you are using to try to assign values to a column are not working because these are not vectorised: they require a single TRUE
or FALSE
, not a vector of logicals. You can see that the expression is longer than one by evaluating it, asking for the length:
> df$Acat == "low"
[1] TRUE TRUE FALSE FALSE FALSE FALSE NA
> length(df$Acat == "low")
[1] 7
Instead, you can build a named vector with the values you want, and use a subsetting operation to get them to the right place:
df$Acat <- cut(df$A,
breaks=c(-Inf,2.5,4.5,Inf),
labels=c("low","mod","hi"))
named_vec <- c("low" = 1, "mod" = 2, "hi" = 3)
df$C <- named_vec[df$Acat]
Which gives you this data.frame:
> df
A B Acat C
1 1 2 low 1
2 2 3 low 1
3 3 4 mod 2
4 4 NA mod 2
5 5 NA hi 3
6 6 NA hi 3
7 NA NA <NA> NA
There are multiple other options to get the same result, but subsetting by name is I would think relatively intuitive.
How to flatten a list of lists?
I expect that unlist(foolist)
will help you. It has an option recursive
which is TRUE
by default.
So unlist(foolist, recursive = FALSE)
will return the list of the documents, and then you can combine them by:
do.call(c, unlist(foolist, recursive=FALSE))
do.call
just applies the function c
to the elements of the obtained list
Related Topics
Why Does ".." Work to Pass Column Names in a Character Vector Variable
Trouble Installing "Sf" Due to "Gdal"
Print the Sourced R File to an Appendix Using Sweave
Convert a File Encoding Using R? (Ansi to Utf-8)
Compute Only Diagonals of Matrix Multiplication in R
Preview a Saved Png in an R Device Window
How Create a Sequence of Strings with Different Numbers in R
Split Data.Frame into Groups by Column Name
Plot Margins in Rmarkdown/Knitr
Ggplot Inserting Space Before Degree Symbol on Axis Label
How to Prevent Rplots.Pdf from Being Generated
Adding Missing Dates to Dataframe
Get Continent Name from Country Name in R
R: Interactive Plots (Tooltips): Rcharts Dimple Plot: Formatting Axis
Replace Every Single Character at the Start of String That Matches a Regex Pattern
Is There an Alternative to "Revalue" Function from Plyr When Using Dplyr
Making Binned Scatter Plots for Two Variables in Ggplot2 in R