Removing Attributes of Columns in Data.Frames on Multilevel Lists in R

Removing attributes of columns in data.frames on multilevel lists in R

This is perhaps too late to answer on this thread, but I wanted to share.

Two solutions :
1. function stripAttributes from merTools package.


  1. to remove the attribute ATT from variable VAR in your data-frame MyData:

    attr(MyData$VAR, "ATT") <- NULL

If you want to remove several attributes of all variables :

For (var in colnames(MyData)) {
attr(MyData[,deparse(as.name(var))], "ATT_1") <- NULL
attr(MyData[,deparse(as.name(var))], "ATT_2") <- NULL
}

I hope This Helps,
Regards

How to erase all attributes?

To remove all attributes, how about this

df1[] <- lapply(df1, function(x) { attributes(x) <- NULL; x })
str(df1)
#'data.frame': 3 obs. of 4 variables:
# $ id: int 1 2 3
# $ V1: int 4 5 6
# $ V2: int 7 8 9
# $ V3: int 10 11 12

Behavior of - NULL on lists versus data.frames for removing data

DISCLAIMER : This is a relatively long answer, not very clear, and not very interesting, so feel free to skip it or to only read the (sort of) conclusion.

I've tried a bit of tracing on
[<-.data.frame, as suggested by Ari B. Friedman. Debugging starts on line 162 of the function, where there is a test to determine if value (the replacement value argument) is not a list.

Case 1 : value is not a list

Then it is considered as a vector. Matrices and arrays are considered as one vector, like the help page says :

Note that when the replacement value is an array (including a matrix)
it is not treated as a series of columns (as 'data.frame’ and
‘as.data.frame’ do) but inserted as a single column.

If only one column of the data frame is selected in the LHS, then the only constraint is that the number of rows to be replaced must be equal to or a multiple of length(value). If this is the case, value is recycled with rep if necessary and converted to a list. If length(value)==0, there is no recycling (as it is impossible), and value is just converted to a list.

If several columns of the data frame are selected in the LHS, then the constraint is a bit more complex : length(value) must be equal to or a multiple of the total number of elements to be replaced, ie the number of rows * the number of columns.

The exact test is the following :

(m < n * p && (m == 0L || (n * p)%%m))

Where n is the number of rows, p the number of columns, and m the length of value. If the condition is FALSE, then value is converted into an n x p matrix (thus recycled if necessary) and the matrix is splitted by columns into a list.

If value is NULL, then the condition is TRUE as m==0, and the function is stopped.
Note that the problem occurs for every value of length 0. For example,

cars1[,c("mpg")] <- numeric(0)

works, whereas :

cars1[,c("mpg","disp")] <- numeric(0)

fails in the same way as cars1[,c("mpg","disp")] <- NULL

Case 2 : value is a list

If value is a list, then it is used to replace several columns at the same time. For example :

cars1[,c("mpg","disp")] <- list(1,2)

will replace cars1$mpg with a vector of 1s, and cars1$disp with a vector of 2s.

There is a sort of "double recycling" which happens here :

  • first, the length of the value list must be less than or equal to the number of columns to be replaced. If it is less, then a classic recycling is done.
  • second, for each element of the value list, its length must be equal to, greater than or a multiple of the number of rows to be replaced. If it is less, another recycling is done for each list element to match the number of rows. If it is more, a warning is displayed.

When the value in RHS is list(NULL), nothing really happens, as recycling is impossible (rep(NULL, 10) is always NULL). But the code continues and in the end each column to be replaced is assigned NULL, ie is removed.

Summary and (sort of) conclusion

data.frame and list behave differently because of the specific constraint on data frames, where each element must be of the same length. Removing several columns by assigning NULL fails not because of the NULL value by itself, but because NULL is of length 0. The error comes from a test which verifies if the length of the assigned value is a multiple of the number of elements to be replaced (number of rows * number of columns).

Handling the case of value=NULL for multiple columns doesn't seem difficult (by adding about four lines of simple code), but it requires to consider NULL as a special case. I'm not able to determine if it is not handled because it would break the logic of the function implementation, or because it would have side effects I don't know.

Converting nested list to dataframe

You can also use (at least v1.9.3) of rbindlist in the data.table package:

library(data.table)

rbindlist(mylist, fill=TRUE)

## Hit Project Year Rating Launch ID Dept Error
## 1: True Blue 2011 4 26 Jan 2012 19 1, 2, 4 NA
## 2: False NA NA NA NA NA NA Record not found
## 3: True Green 2004 8 29 Feb 2004 183 6, 8 NA

Sort (order) data frame rows by multiple columns

You can use the order() function directly without resorting to add-on tools -- see this simpler answer which uses a trick right from the top of the example(order) code:

R> dd[with(dd, order(-z, b)), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1

Edit some 2+ years later: It was just asked how to do this by column index. The answer is to simply pass the desired sorting column(s) to the order() function:

R> dd[order(-dd[,4], dd[,1]), ]
b x y z
4 Low C 9 2
2 Med D 3 1
1 Hi A 8 1
3 Hi A 9 1
R>

rather than using the name of the column (and with() for easier/more direct access).

How assign value to new variable using multiple relational operators in data with missing values in [r]?

Here are a few pointers. In R, it is common to assign variables to names using the <- operator. To be fair, I didn't even know you could assign length to a variable, so I learned something new.

A <- seq(1, 6)
length(A) <- 7
B <- seq(2, 4)
length(B) <- 7

m <- cbind(A, B)

The difference between a matrix and a data.frame is that a matrix is a vector of numbers with a dim attribute specifying the dimensions (also true for arrays), whereas a data.frame is a series of lists (along columns) of equal length (the number of rows).

What this means in practice is that data.frames can have anything in different columns, e.g. one might be a character and another an integer, whereas matrices can only contain data of the same type.

> attributes(m)
$dim
[1] 7 2

$dimnames
$dimnames[[1]]
NULL

$dimnames[[2]]
[1] "A" "B"
> df <- as.data.frame(m)
> attributes(df)
$names
[1] "A" "B"

$class
[1] "data.frame"

$row.names
[1] 1 2 3 4 5 6 7

> is.list(m)
[1] FALSE
> is.list(df)
[1] TRUE

The if-else statements you are using to try to assign values to a column are not working because these are not vectorised: they require a single TRUE or FALSE, not a vector of logicals. You can see that the expression is longer than one by evaluating it, asking for the length:

> df$Acat == "low"
[1] TRUE TRUE FALSE FALSE FALSE FALSE NA

> length(df$Acat == "low")
[1] 7

Instead, you can build a named vector with the values you want, and use a subsetting operation to get them to the right place:

df$Acat <- cut(df$A,
breaks=c(-Inf,2.5,4.5,Inf),
labels=c("low","mod","hi"))

named_vec <- c("low" = 1, "mod" = 2, "hi" = 3)
df$C <- named_vec[df$Acat]

Which gives you this data.frame:

> df
A B Acat C
1 1 2 low 1
2 2 3 low 1
3 3 4 mod 2
4 4 NA mod 2
5 5 NA hi 3
6 6 NA hi 3
7 NA NA <NA> NA

There are multiple other options to get the same result, but subsetting by name is I would think relatively intuitive.

How to flatten a list of lists?

I expect that unlist(foolist) will help you. It has an option recursive which is TRUE by default.

So unlist(foolist, recursive = FALSE) will return the list of the documents, and then you can combine them by:

do.call(c, unlist(foolist, recursive=FALSE))

do.call just applies the function c to the elements of the obtained list



Related Topics



Leave a reply



Submit