Equivalent of a Python Dict in R

equivalent of a python dict in R

The closest thing to a python dict in R is simply a list. Like most R data types, lists can have a names attribute that can allow lists to act like a set of name-value pairs:

> l <- list(a = 1,b = "foo",c = 1:5)
> l
$a
[1] 1

$b
[1] "foo"

$c
[1] 1 2 3 4 5

> l[['c']]
[1] 1 2 3 4 5
> l[['b']]
[1] "foo"

Now for the usual disclaimer: they are not exactly the same; there will be differences. So you will be inviting disappointment to try to literally use lists exactly the way you might use a dict in python.

Is it possible to pass a dictionary as function parameters in R?

R doesn’t have a built-in dictionary data structure. The closest equivalent, depending on your use-case, is either a named vector, a named list, or an environment.

So, in your case, you’d define params as

params = list(param1 = 1, param2 = 'str', param3 = TRUE}

… of course this doesn’t allow using variables for the names, but you can assign the names after the fact to fix that, or use setNames; e.g.:

params = setNames(list(1, 'str', TRUE), paste0('param', 1 : 3))

These work “well enough”, as long as your dictionary keys are strings. Otherwise, you’ll either need a data.frame with a lookup key column and a value column, or a proper data structure.

Correspondingly, there’s also no specific syntactic sugar for the creation of dictionaries. But R’s syntax is flexible, so we can have some fun.

R also doesn’t have a splat operator as in Python, but R has something much more powerful: thanks to metaprogramming, we can generate dynamic function call expressions on the fly. And since calling functions with a list of parameters is such a common operation, R provides a special wrapper for it: do.call.

For instance (using the code from the above link for the syntactic sugar to generate dictionaries):

params = dict('param1': 1, 'param2': 'str', 'param3': TRUE)

myfunc = function (param1, param2, param3) {
    toString(as.list(environment()))
}

do.call('myfunc', params)
# [1] "1, str, TRUE"

How to use a dictionary for a large data frame in R?

A list of hashmap dictionaries:

dat <-
  structure(
    list(
      ...1 = c("category 1", NA, NA, NA, "total", "category 2",
               NA, NA, NA, "total"),
      Items = c(
        "product 1",
        "product 2",
        "product 3",
        "product 4",
        NA,
        "product 1",
        "product 2",
        "product 3",
        "product 4",
        NA
      ),
      price = c(1, 2, 3, 4, 10, 3, 4, 5, 6, 18)
    ),
    row.names = c(NA,-10L),
    class = c("tbl_df", "tbl", "data.frame")
  )

library(hashmap)

dat_clean <- tidyr::fill(dat[!is.na(dat[["Items"]]), ], 1)

list_of_dicts <- lapply(split(dat_clean, dat_clean[[1]]), function(d){
  hashmap(d[["Items"]], d[["price"]])  
})

list_of_dicts
# $`category 1`
# ## (character) => (numeric)  
# ## [product 1] => [+1.000000]
# ## [product 3] => [+3.000000]
# ## [product 4] => [+4.000000]
# ## [product 2] => [+2.000000]
# 
# $`category 2`
# ## (character) => (numeric)  
# ## [product 1] => [+3.000000]
# ## [product 3] => [+5.000000]
# ## [product 4] => [+6.000000]
# ## [product 2] => [+4.000000]

# get totals:
lapply(list_of_dicts, function(dict){
  sum(dict$values())
})
# $`category 1`
# [1] 10
# 
# $`category 2`
# [1] 18

Does R have 'dict' as in python or 'map' as in c++ do?

Yes it does and it is called list.

> x <- list(a=1, b="foo", c=c(1,1,2,3,5))
> x
$a
[1] 1

$b
[1] "foo"

$c
[1] 1 1 2 3 5

In Python it is called dict, for what it's worth.

Comparing key, value pairs equivalent in R

You can do (this will take into consideration possible different ordering in your list):

> unlist(list_two[names(list_one)])!=unlist(list_one)
    a     b     c 
FALSE FALSE  TRUE

Python equivalent of R list()

The Python documentation at https://docs.python.org/3/tutorial/introduction.html implies that you can create recursive structures (the proper R term for structures that can have a tree-characteristic) of varying types with the "[" operator:

>>> a = ['a', 'b', 'c']
>>> n = [1, 2, 3]
>>> x = [a, n]
>>> x
[['a', 'b', 'c'], [1, 2, 3]]

I'm just an R guy but that would seem to imply that Python's "list" data-type strongly resembles R's list type.

To get named "recursive" structures, it appears one needs to use a "dictionary" ( created with flanking "{","}" ).

>>> x = {'a':a, 'n':n}
>>> x
{'a': ['a', 'b', 'c'], 'n': [1, 2, 3]}

It appears that Python requires names for its dictionary entries while R allows both named and unnamed entries in a list.

>>> x = {'a':a, 'n':n, 'z':[1,2,3], 'zz':{'s':[4,5,6], 'd':['t','y']} }
>>> x
{'a': ['a', 'b', 'c'], 'n': [1, 2, 3], 'z': [1, 2, 3], 'zz': {'s': [4, 5, 6], 'd': ['t', 'y']}}

The accession from Python dicts resembles the access to items when using R:

>>> x['zz']
{'s': [4, 5, 6], 'd': ['t', 'y']}
>>> x['zz']['s']
[4, 5, 6]

How to create a dictionary and insert key with a list of value in R?

You could use the hash package for this task:

library(hash)

h <- hash()

for (word in file) {
    key <- dosomecalculation(word)
    if (!has.key(key, h)) {
        h[key] <- list()
    } else {
        h[key] <- append(h[[key]], word)
    }
}

Using [[ for indexing (e.g. h[["foo"]]) will then return the corresponding list.

How to call Python method from R reticulate

As pointed out in Type Conversions, Python's dict objects become named lists in R. So, to access the equivalent of "dictionary keys" in R you would use names:

```{r}
names(py$fruits)
```
## [1] "melon"  "apple"  "banana"

You may choose to convert the result back to a dict-like object using reticulate::dict(). The resulting object would then function as you want:

```{r}
reticulate::dict( py$fruits )
```
## {'melon': 7, 'apple': 53, 'banana': None}

```{r}
reticulate::dict( py$fruits )$keys()
```
## ['melon', 'apple', 'banana']

dictionary and list comprehension in R

We're talking about speed, so let's do some benchmarking:

library(microbenchmark)
microbenchmark(op = {for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
               lapply = setNames(lapply(l1,foo),l2),
               vectorised = setNames(as.list(foo(l1)), l2))

Unit: microseconds
       expr   min    lq     mean median     uq    max neval
         op 7.982 9.122 10.81052  9.693 10.548 36.206   100
     lapply 5.987 6.557  7.73159  6.842  7.270 55.877   100
 vectorised 4.561 5.132  6.72526  5.417  5.987 80.964   100

But these small values don't mean much, so I pumped up the vector length to 10,000 where you'll really see a difference:

l <- 10000
l1 <- seq_len(l)
l2 <- sample(letters, l, replace = TRUE)

microbenchmark(op = {bar <- list(); for (i in 1:length(l1)) bar[l2[i]] <- foo(l1[i])},
               lapply = setNames(lapply(l1,foo),l2),
               vectorised = setNames(as.list(foo(l1)), l2),
               times = 100)

Unit: microseconds
       expr       min        lq       mean     median        uq       max neval
         op 30122.865 33325.788 34914.8339 34769.8825 36721.428 41515.405   100
     lapply 13526.397 14446.078 15217.5309 14829.2320 15351.933 19241.767   100
 vectorised   199.559   259.997   349.0544   296.9155   368.614  3189.523   100

But tacking onto what everyone else said, it doesn't have to be a list. If you remove the list requirement:

microbenchmark(setNames(foo(l1), l2))

Unit: microseconds
                  expr    min      lq     mean  median     uq      max neval
 setNames(foo(l1), l2) 22.522 23.8045 58.06888 25.0875 48.322 1427.417   100

Equivalent of a Python Dict in R