Equivalent of a Python Dict in R

equivalent of a python dict in R

The closest thing to a python dict in R is simply a list. Like most R data types, lists can have a names attribute that can allow lists to act like a set of name-value pairs:

> l <- list(a = 1,b = "foo",c = 1:5)
> l
$a
[1] 1

$b
[1] "foo"

$c
[1] 1 2 3 4 5

> l[['c']]
[1] 1 2 3 4 5
> l[['b']]
[1] "foo"

Now for the usual disclaimer: they are not exactly the same; there will be differences. So you will be inviting disappointment to try to literally use lists exactly the way you might use a dict in python.

Is it possible to pass a dictionary as function parameters in R?

R doesn’t have a built-in dictionary data structure. The closest equivalent, depending on your use-case, is either a named vector, a named list, or an environment.

So, in your case, you’d define params as

params = list(param1 = 1, param2 = 'str', param3 = TRUE} 

… of course this doesn’t allow using variables for the names, but you can assign the names after the fact to fix that, or use setNames; e.g.:

params = setNames(list(1, 'str', TRUE), paste0('param', 1 : 3))

These work “well enough”, as long as your dictionary keys are strings. Otherwise, you’ll either need a data.frame with a lookup key column and a value column, or a proper data structure.

Correspondingly, there’s also no specific syntactic sugar for the creation of dictionaries. But R’s syntax is flexible, so we can have some fun.

R also doesn’t have a splat operator as in Python, but R has something much more powerful: thanks to metaprogramming, we can generate dynamic function call expressions on the fly. And since calling functions with a list of parameters is such a common operation, R provides a special wrapper for it: do.call.

For instance (using the code from the above link for the syntactic sugar to generate dictionaries):

params = dict('param1': 1, 'param2': 'str', 'param3': TRUE)

myfunc = function (param1, param2, param3) {
toString(as.list(environment()))
}

do.call('myfunc', params)
# [1] "1, str, TRUE"

How to use a dictionary for a large data frame in R?

A list of hashmap dictionaries:

dat <-
structure(
list(
...1 = c("category 1", NA, NA, NA, "total", "category 2",
NA, NA, NA, "total"),
Items = c(
"product 1",
"product 2",
"product 3",
"product 4",
NA,
"product 1",
"product 2",
"product 3",
"product 4",
NA
),
price = c(1, 2, 3, 4, 10, 3, 4, 5, 6, 18)
),
row.names = c(NA,-10L),
class = c("tbl_df", "tbl", "data.frame")
)

library(hashmap)

dat_clean <- tidyr::fill(dat[!is.na(dat[["Items"]]), ], 1)

list_of_dicts <- lapply(split(dat_clean, dat_clean[[1]]), function(d){
hashmap(d[["Items"]], d[["price"]])
})

list_of_dicts
# $`category 1`
# ## (character) => (numeric)
# ## [product 1] => [+1.000000]
# ## [product 3] => [+3.000000]
# ## [product 4] => [+4.000000]
# ## [product 2] => [+2.000000]
#
# $`category 2`
# ## (character) => (numeric)
# ## [product 1] => [+3.000000]
# ## [product 3] => [+5.000000]
# ## [product 4] => [+6.000000]
# ## [product 2] => [+4.000000]

# get totals:
lapply(list_of_dicts, function(dict){
sum(dict$values())
})
# $`category 1`
# [1] 10
#
# $`category 2`
# [1] 18

Does R have 'dict' as in python or 'map' as in c++ do?

Yes it does and it is called list.

> x <- list(a=1, b="foo", c=c(1,1,2,3,5))
> x
$a
[1] 1

$b
[1] "foo"

$c
[1] 1 1 2 3 5

In Python it is called dict, for what it's worth.

Comparing key, value pairs equivalent in R

You can do (this will take into consideration possible different ordering in your list):

> unlist(list_two[names(list_one)])!=unlist(list_one)
a b c
FALSE FALSE TRUE

Python equivalent of R list()

The Python documentation at https://docs.python.org/3/tutorial/introduction.html implies that you can create recursive structures (the proper R term for structures that can have a tree-characteristic) of varying types with the "[" operator:

>>> a = ['a', 'b', 'c']
>>> n = [1, 2, 3]
>>> x = [a, n]
>>> x
[['a', 'b', 'c'], [1, 2, 3]]

I'm just an R guy but that would seem to imply that Python's "list" data-type strongly resembles R's list type.

To get named "recursive" structures, it appears one needs to use a "dictionary" ( created with flanking "{","}" ).

>>> x = {'a':a, 'n':n}
>>> x
{'a': ['a', 'b', 'c'], 'n': [1, 2, 3]}

It appears that Python requires names for its dictionary entries while R allows both named and unnamed entries in a list.

>>> x = {'a':a, 'n':n, 'z':[1,2,3], 'zz':{'s':[4,5,6], 'd':['t','y']} }
>>> x
{'a': ['a', 'b', 'c'], 'n': [1, 2, 3], 'z': [1, 2, 3], 'zz': {'s': [4, 5, 6], 'd': ['t', 'y']}}

The accession from Python dicts resembles the access to items when using R:

>>> x['zz']
{'s': [4, 5, 6], 'd': ['t', 'y']}
>>> x['zz']['s']
[4, 5, 6]

How to create a dictionary and insert key with a list of value in R?

You could use the hash package for this task:

library(hash)

h <- hash()

for (word in file) {
key <- dosomecalculation(word)
if (!has.key(key, h)) {
h[key] <- list()
} else {
h[key] <- append(h[[key]], word)
}
}

Using [[ for indexing (e.g. h[["foo"]]) will then return the corresponding list.

How to call Python method from R reticulate

As pointed out in Type Conversions, Python's dict objects become named lists in R. So, to access the equivalent of "dictionary keys" in R you would use names:

```{r}
names(py$fruits)
```
## [1] "melon" "apple" "banana"

You may choose to convert the result back to a dict-like object using reticulate::dict(). The resulting object would then function as you want:

```{r}
reticulate::dict( py$fruits )
```
## {'melon': 7, 'apple': 53, 'banana': None}

```{r}
reticulate::dict( py$fruits )$keys()
```
## ['melon', 'apple', 'banana']

dictionary and list comprehension in R

We're talking about speed, so let's do some benchmarking:

library(microbenchmark)
microbenchmark(op = {for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
lapply = setNames(lapply(l1,foo),l2),
vectorised = setNames(as.list(foo(l1)), l2))

Unit: microseconds
expr min lq mean median uq max neval
op 7.982 9.122 10.81052 9.693 10.548 36.206 100
lapply 5.987 6.557 7.73159 6.842 7.270 55.877 100
vectorised 4.561 5.132 6.72526 5.417 5.987 80.964 100

But these small values don't mean much, so I pumped up the vector length to 10,000 where you'll really see a difference:

l <- 10000
l1 <- seq_len(l)
l2 <- sample(letters, l, replace = TRUE)

microbenchmark(op = {bar <- list(); for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
lapply = setNames(lapply(l1,foo),l2),
vectorised = setNames(as.list(foo(l1)), l2),
times = 100)

Unit: microseconds
expr min lq mean median uq max neval
op 30122.865 33325.788 34914.8339 34769.8825 36721.428 41515.405 100
lapply 13526.397 14446.078 15217.5309 14829.2320 15351.933 19241.767 100
vectorised 199.559 259.997 349.0544 296.9155 368.614 3189.523 100

But tacking onto what everyone else said, it doesn't have to be a list. If you remove the list requirement:

microbenchmark(setNames(foo(l1), l2))

Unit: microseconds
expr min lq mean median uq max neval
setNames(foo(l1), l2) 22.522 23.8045 58.06888 25.0875 48.322 1427.417 100


Related Topics



Leave a reply



Submit