Assignment to Empty Index (Empty Square Brackets X[]<-) on Lhs

Assignment to empty index (empty square brackets x[]-) on LHS

This is an intentional and documented feature. As joran mentioned, the documentation page "Extract" includes this in the "Atomic Vectors" section:

An empty index selects all values: this is most often used to replace all the entries but keep the attributes.

However, in the case of recursive objects (data.frames or lists, for example), the attributes are only kept for the subsetted object. Its parts don't get such protection.

Here's an example:

animals <- factor(c('cat', 'dog', 'fish'))
df_factor <- data.frame(x = animals)
rownames(df_factor) <- c('meow', 'bark', 'blub')
str(df_factor)
# 'data.frame': 3 obs. of 1 variable:
# $ x: Factor w/ 3 levels "cat","dog","fish": 1 2 3

df_factor[] <- 'cat'
str(df_factor)
# 'data.frame': 3 obs. of 1 variable:
# $ x: chr "cat" "cat" "cat"
rownames(df_factor)
# [1] "meow" "bark" "blub"

df_factor kept its rownames attribute, but the x column is just the character vector used in the assignment instead of a factor. We can keep the class and levels of x by specifically replacing its values:

df_factor <- data.frame(x = animals)
df_factor$x[] <- 'cat'
str(df_factor)
# 'data.frame': 3 obs. of 1 variable:
# $ x: Factor w/ 3 levels "cat","dog","fish": 1 1 1

So replacement with empty subsetting is very safe for vectors, matrices, and arrays, because their elements can't have their own attributes. But it requires some care when dealing with list-like objects.

Applying empty brackets in R drops attributes? (reading the R language definition)

I think it may simply be mis-documented in the current R language definition document.

As you've found, the behaviour is opposite to what is described. Note that, in your example, if you subset using v[1:length(v)], you get the behaviour you expected from v[]. So the empty [] is the exception that returns the attributes unchanged.

Looking for the answer I found an illustrative commit/comment (see diffs here: https://github.com/wch/r-source/commit/6b3480e05e9671a517d70c80b9f3aac53b6afd9d#diff-3347e77b1c102d875a744a2cd7fa86e5) The author describes the behaviour that you have observed:

Subsetting (other than by an empty index) generally drops all attributes
except @code{names}, @code{dim} and @code{dimnames} which are reset as
appropriate. On the other hand, subassignment generally preserves
attributes even if the length is changed. Coercion drops all attributes.

I think if the subset [] is empty, the object that is returned is simply a copy of the original object.

EDIT (from comments below):

The reason that the attributes of v and v[] appear in a different order, is likely because of the way the attributes are assigned to the new subset in this special case of subsetting with an empty index. Further, the different order shouldn't be considered a bug, because attributes are not supposed to have an order (see help(attributes). Note that in help(``[``), the behaviour you observed is accurately described (unlike in language definition you referenced), and explains why one would want this behaviour:

An empty index selects all values: this is most often used to replace all > the entries but keep the ‘attributes’."

Make a function work on the LHS of the assignment operator in R

According to ?rownames

row.names returns a character vector.

row.names<- returns a data frame with the row names changed.

`my.rownames<-` <- `rownames<-`

Also,

There are generic functions for getting and setting row names, with default methods for arrays. The description here is for the data.frame method.

.rowNamesDF<- is a (non-generic replacement) function to set row names for data frames, with extra argument make.names. This function only exists as workaround as we cannot easily change the row.names<- generic without breaking legacy code in existing packages.

it should work

data(mtcars)
my.rownames(mtcars) <- foo
head(mtcars)
# mpg cyl disp hp drat wt qsec vs am gear carb
#x Mazda RX4 21.0 6 160 110 3.90 2.620 16.46 0 1 4 4
#x Mazda RX4 Wag 21.0 6 160 110 3.90 2.875 17.02 0 1 4 4
#x Datsun 710 22.8 4 108 93 3.85 2.320 18.61 1 1 4 1
#x Hornet 4 Drive 21.4 6 258 110 3.08 3.215 19.44 1 0 3 1
#x Hornet Sportabout 18.7 8 360 175 3.15 3.440 17.02 0 0 3 2
#x Valiant 18.1 6 225 105 2.76 3.460 20.22 1 0 3 1

Syntax in R for breaking up LHS of assignment over multiple lines

You can put a line break between any 2 characters that aren't part of a name, and that doesn't leave a syntactically complete expression before the line break (so that the parser knows to look for more). None of these look great, but basically after any [[ or $ or before ]] you can put a line break. For example:

results$
cases[[i]]$
samples[[j]]$
portions[[k]]$
analytes[[l]]$
column <- x

Or going to the extreme, putting in every syntactically valid line break (without introducing parentheses which would let you do even more):

results$
cases[[
i
]]$
samples[[
j
]]$
portions[[
k
]]$
analytes[[
l
]]$
column <-
x

With parentheses, we lose the "doesn't leave a syntactically complete expression" rule, because the expression won't be complete until the parenthses close. You can add breaks anywhere except in the middle of a name (object or function name). I won't bother with nested indentation for this example.

(
results
$
cases
[[
i
]]
$
samples
[[
j
]]
$
portions
[[
k
]]
$
analytes
[[
l
]]
$
column
<-
x
)

If you want to bring attention to the x being assigned, you could also use right assignment.

x -> results$cases[[i]]$samples[[j]]$
portions[[k]]$analytes[[l]]$column

How do I preserve the data frame structure after using paste()?

Use [] to preserve the structure.

list1[] <- paste(list1,"example")

str(list1)
#List of 2
# $ : chr "A1 example"
# $ : chr "A2 example"

Assignment of variable to list using indexing

Your code reads:

def func(first):
third = first[0]
first[0][0] = 5
print(third)

first = [[3,4]]
func(first)

What's happening is this:

  • In func(), the argument first contains a reference to a list of lists with value [[3,4]].
  • After the assignment to third, third contains a reference to the list [3,4] at position 0 in the list referenced by first. No new list object has been created and no copy of a list has taken place, rather a new reference to the existing list has been created and stored in the variable third.
  • In the line first[0][0] = 5, the item at position 0 in the list [3,4] is updated so that the list is now [5,4]. Note that the list [3,4] that was modified is an element of the list of lists referenced by first, and it is also the one referenced by third. Because the object (namely, the list) that is referenced by third has now been modified, any use of this reference to access the list (such as print(third)) will reflect its updated value, which is [5,4].

UPDATE:

The code for your updated question is:

def func(first):
third = first[0][0:2]
first[0][0] = 5
print(third)

first = [[3,4]]
func(first)

In this case, the assignment third = first[0][0:2] takes a slice of the list [3,4] at position 0 in the list of lists referenced by first. Taking a slice in this way creates a new object which is a copy of the subsequence indicated by the arguments specified in the square brackets, so after the assignment, the variable third contains a reference to a newly created list with value [3,4]. The subsequent assignment first[0][0] = 5 updates the value of the list [3,4] in position 0 of the list of lists referenced by first, with the result that the value of the list becomes [5,4], and has no effect on the value of third which is an independent object with value [3,4].

Importantly (and potentially confusingly), slice notation used on the left-hand side of an assignment works very differently. For example, first[0][0:2] = [5,4] would change the contents of the list first[0] such that the elements in index 0 and 1 are replaced by [5,4] (which in this case means the value of the list object would be changed from [3,4] to [5,4], but it would be the same object).

How to scale data frame but retain type of dataframe?

There's kind of two questions here. The first, how to insert into a data.frame without losing the data.frame structure. In this case use [<- like so:

df <- data.frame(a=1:3, b=10:12, c=20:22)
df[] <- scale(df)
df
# a b c
#1 -1 -1 -1
#2 0 0 0
#3 1 1 1

This is covered in a lot more detail here: R: Easy assignments with empty square brackets? x[]<-

The second question is how to update a subset of a data.frame. Again, use [<- but this time match the selections on the left and right sides of the <- :

df <- data.frame(a=1:3, b=10:12, c=20:22)
df[1:2] <- scale(df[1:2])
df
# a b c
#1 -1 -1 20
#2 0 0 21
#3 1 1 22

Multiple assignment semantics

One case when you need to include more structure on the left hand side of the assignment is when you're asking Python unpack a slightly more complicated sequence. E.g.:

# Works
>>> a, (b, c) = [1, [2, 3]]

# Does not work
>>> a, b, c = [1, [2, 3]]
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
ValueError: need more than 2 values to unpack

This has proved useful for me in the past, for example, when using enumerate to iterate over a sequence of 2-tuples. Something like:

>>> d = { 'a': 'x', 'b': 'y', 'c': 'z' }
>>> for i, (key, value) in enumerate(d.iteritems()):
... print (i, key, value)
(0, 'a', 'x')
(1, 'c', 'z')
(2, 'b', 'y')

Dynamically responding to an unpacking assignment statement

You could use the traceback module:

import traceback

def diabolically_invoke_traceback():
call = traceback.extract_stack()[-2]
print call[3]
unpackers = call[3].split('=')[0].split(',')
print len (unpackers)
return range(len(unpackers))

In [63]: a, b, c = diabolically_invoke_traceback()
a, b, c = diabolically_invoke_traceback()
3

In [64]: a
Out[64]: 0

In [65]: b
Out[65]: 1

In [66]: c
Out[66]: 2

Convert all columns to characters in a data.frame

EDIT: 2021-03-01

Beginning with dplyr 1.0.0, the _all() function variants are superceded. The new way to accomplish this is using the new across() function.

library(dplyr)
mtcars %>%
mutate(across(everything(), as.character))

With across(), we choose the set of columns we want to modify using tidyselect helpers (here we use everything() to choose all columns), and then specify the function we want to apply to each of the selected columns. In this case, that is as.character().

Original answer:

You can also use dplyr::mutate_all.

library(dplyr)
mtcars %>%
mutate_all(as.character)


Related Topics



Leave a reply



Submit