Is There a Logical Way to Think About List Indexing

Is there a logical way to think about List Indexing?

A simple analogy is to think of a list as a train. Each car in the train is carrying stuff. If you remove two cars, you have a train with two fewer cars. If you remove all but one car, it is still a train with a single car.

  • Reducing the size of the train or reorganizing the order of the cars can be achieved though the [] (subsetting) function.
  • To examine the contents of a particular car, you have to open the doors, which is achieved through [[]] (though $ may also be used with a named list). I refer to this as the extraction function, though I'm not sure if this is a widely used term.

In your example, mylist[2] is a sublist of mylist containing one element. You can verify this with length(mylist[2]). Provided that the arguments are valid, the [ function will provide a list with as many elements as are in the numeric or character vector provided as an argument to [. Most often, we are interested in examining the contents of a list item. This is achieved with the [[ function. For example, mylist[[2]] is the contents of mylist[2], which itself is a list containing multiple elements. To see this, try length(mylist[[2]])

Because [ can be thought of as a list subsetting function and [[ as a list element extraction function, mylist[1:2] and mylist[c(1,2)] return a sublist (which is equivalent to mylist in this case), whereas mylist[[1:2]] and mylist[[c(1,2)]] return a "subscript out of bounds" error. It is only possible to extract one list element at a time (ie, per function call).

@richard-scriven alerted me to a link on a Hadley Wickham twitter post providing an additional analogy of a nested list in the form of photographs.

With a fairly simple list structure, str is great way to get an idea of the list contents. In this example, the output of str(mylist[2]) and str(mylist[[2]]) provide additional insight into their differing data structure.

In general, a list is agnostic to its contents, so that a single list may contain other lists, data.frames, matrices, and atomic vectors as separate elements. As @joran, joked in his comment, this where the train analogy gets stretched, maybe a little too much. However, once you are comfortable with the first level of a list, additional nested lists behave in the same way. (maybe the nested lists are boxes carried inside of the train car?)

Side Note:

One of my favorite functions for examining lists and data.frames (which are lists with atomic vectors of a common length), is the str function. I regularly use it after reading in a .csv, .dta, or other file to examine the list structure. A common hurdle with users learning R (as well as experienced users) in debugging code is keeping in mind what data structure they are working with and what data structure is needed as an argument for or what data structure is the output of a function. str together with typeof and class, are an excellent suite of tools in addressing this problem.

This answer benefits from comments from @42, @nicola, @joran, @jogo, and @richard-scriven.

Why list indexing [-4:0] not working while [0:4] works?

l[-3:0] tries to slice from 3 from behind towards 0 - that is the same as l[2:0] .. that slices nothing because the first value > second value.

l[-3:] can be read as l[-3:len(l)] - so l[2:5] which returns the slice.

You would need l[-3:0:-1] for that to work - but thats mind boggling slicing which I try to avoid. ( print( [1,2,3,4,5][-3:0:-1] --> [3, 2] ) because it also reverses the slice "orientation" to backwards instead of forwards

l[-3:] slices from 3 from behind till the end.

choosing the best index according to condition among three lists with different range of values in python

Here's the data you have:

d = {'a':[600,150,820,500,400], 'b':[0.99,1.0,0.75,0.96,0.97], 'c':[(100,105),(50,40),(500,480),(200,190),(120,110)]}
a_thresh = 200
b_thresh = 0.95

This is one way of solving, making just one pass over the lists in the dictionary:

from operator import mul

list_len = len(d['a'])
found_i = 0
for i in range(list_len):
if ((d['a'][i]>=a_thresh) and (d['b'][i]>=b_thresh) and
(mul(*d['c'][i]) > mul(*d['c'][found_i]))):
found_i = i
print (found_i)

Output:

3

You can do this without importing and using the mul() function, of course. It is only make the loop condition to appear a little compact. The mul() is just for multiplying the two parts of a tuple. To do this without mul(), search and replace (mul(*d['c'][3]) > mul(*d['c'][found_i])) with the longer expression ((d['c'][3][0]*d['c'][3][1]) > (d['c'][found_i][0]*d['c'][found_i][1]))

Fastest way to check if a value exists in a list


7 in a

Clearest and fastest way to do it.

You can also consider using a set, but constructing that set from your list may take more time than faster membership testing will save. The only way to be certain is to benchmark well. (this also depends on what operations you require)

How to find all occurrences of an element in a list

You can use a list comprehension with enumerate:

indices = [i for i, x in enumerate(my_list) if x == "whatever"]

The iterator enumerate(my_list) yields pairs (index, item) for each item in the list. Using i, x as loop variable target unpacks these pairs into the index i and the list item x. We filter down to all x that match our criterion, and select the indices i of these elements.



Related Topics



Leave a reply



Submit