Find All Occurrences of a Key in Nested Dictionaries and Lists

Find all occurrences of a key in nested dictionaries and lists

I found this Q/A very interesting, since it provides several different solutions for the same problem. I took all these functions and tested them with a complex dictionary object. I had to take two functions out of the test, because they had to many fail results and they did not support returning lists or dicts as values, which i find essential, since a function should be prepared for almost any data to come.

So i pumped the other functions in 100.000 iterations through the timeit module and output came to following result:

0.11 usec/pass on gen_dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
6.03 usec/pass on find_all_items(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.15 usec/pass on findkeys(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
1.79 usec/pass on get_recursively(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.14 usec/pass on find(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -
0.36 usec/pass on dict_extract(k,o)
- - - - - - - - - - - - - - - - - - - - - - - - - - - - -

All functions had the same needle to search for ('logging') and the same dictionary object, which is constructed like this:

o = { 'temparature': '50', 
      'logging': {
        'handlers': {
          'console': {
            'formatter': 'simple', 
            'class': 'logging.StreamHandler', 
            'stream': 'ext://sys.stdout', 
            'level': 'DEBUG'
          }
        },
        'loggers': {
          'simpleExample': {
            'handlers': ['console'], 
            'propagate': 'no', 
            'level': 'INFO'
          },
         'root': {
           'handlers': ['console'], 
           'level': 'DEBUG'
         }
       }, 
       'version': '1', 
       'formatters': {
         'simple': {
           'datefmt': "'%Y-%m-%d %H:%M:%S'", 
           'format': '%(asctime)s - %(name)s - %(levelname)s - %(message)s'
         }
       }
     }, 
     'treatment': {'second': 5, 'last': 4, 'first': 4},   
     'treatment_plan': [[4, 5, 4], [4, 5, 4], [5, 5, 5]]
}

All functions delivered the same result, but the time differences are dramatic! The function gen_dict_extract(k,o) is my function adapted from the functions here, actually it is pretty much like the find function from Alfe, with the main difference, that i am checking if the given object has iteritems function, in case strings are passed during recursion:

# python 2
def gen_dict_extract(key, var):
    if hasattr(var,'iteritems'): # hasattr(var,'items') for python 3
        for k, v in var.iteritems(): # var.items() for python 3
            if k == key:
                yield v
            if isinstance(v, dict):
                for result in gen_dict_extract(key, v):
                    yield result
            elif isinstance(v, list):
                for d in v:
                    for result in gen_dict_extract(key, d):
                        yield result

So this variant is the fastest and safest of the functions here. And find_all_items is incredibly slow and far off the second slowest get_recursivley while the rest, except dict_extract, is close to each other. The functions fun and keyHole only work if you are looking for strings.

Interesting learning aspect here :)

How to find all occurrence of a key in nested dict, but also keep track of the outer dict key value?

This should work regardless of how deep your nesting is (up to the stack limit, at any rate). The request for keeping track of the dict's key is a little awkward--I used a tuple to return the pair. Note that if the found value is in the outermost dictionary, it won't be in the tuple format.

def recursive_lookup(key, d):
    if key in d:
        return d[key]

    for k, v in d.items():
        if isinstance(v, dict):
            result = recursive_lookup(key, v)

            if result:
                return k, result

print(recursive_lookup('label', data))

Output:

('item2', 'label2')

Here's a version that's a little messier (I'm not crazy about an inner function, but at least the accumulator list isn't a parameter and isn't global) but will return a list of all found items nested up to the stack limit, excepting the outermost keys:

def recursive_lookup(key, d):
    def _lookup(key, d):
        if key in d:
            return d[key]

        for k, v in d.items():
            if isinstance(v, dict):
                result = _lookup(key, v)

                if result:
                    accumulator.append((k, result))

    accumulator = []
    _lookup(key, d)
    return accumulator

Output:

[('item3', 'label3'), ('item2', 'label2')]

This can be easily modified if you want to output a dict--replace accumulator = [] with accumulator = {} and accumulator.append((k, result)) with accumulator[k] = result, but this might be awkward to work with, and you can't store duplicate key entries.

As for your final question, the reason you're getting None is because the inner loop returns after checking the first item whether it found something or not. Since label is in the second location of the items() array, it never gets looked at.

Find all occurrences of a key in nested dictionaries and lists - with path

After some experimenting I found the below code to solve my problem.

def gen_dict_location_extract(key, value, path=None):
    if path is None:
        path = []

    if hasattr(value, "items"):
        for k, v in value.items():
            if k == key:  # recursive exit point
                if len(path) > 0:
                    yield (v, path)
                else:   # handling root keys
                    yield (v, None)                 

            if isinstance(v, dict):
                path_copy = path.copy()
                # it is important to do a copy of the path for recursive calls
                # so every iteration has its own path object
                path_copy.append(k)
                yield from gen_dict_location_extract(key, v, path_copy)
            elif isinstance(v, list):
                yield from gen_dict_location_extract(key, v, path)

def call_gen_dict_location_extract(key, enumerable):                    
    results = []
    for result in gen_dict_location_extract(key, enumerable):
        results.append(result)
    return results

To test the code you would execute it like this:

call_gen_dict_location_extract("level", o)

Changing all occurrences of a key in nested dictionaries and lists

Assuming that:

every bakery_items value has an iscake value
you want to inpect every list element in the dictionary to see if there's nested dictionaries in there

Then your code could be (assuming your dictionary is source):

source = { }  # your dictionary here, with typos like missing commas fixed

def filter_for_cake(d):
    return {
        key: {
            k: v for k, v in value.items() if v['iscake'] == 'yes'
        } if key == 'bakery_items' else
        [
            x if not isinstance(x, dict) else filter_for_cake(x)
            for x in value
        ] if isinstance(value, list) else value
        for key, value in d.items()
    }

print(filter_for_cake(source))

Some explanation:

the function iterates over an entire dictionary
for each value that has 'bakery_items' as a key, it assumes it is a dictionary and filters it down to only those elements that have their 'iscake' set to 'yes'
for the other values, if the value is a list, it generates a copy, but passes each dictionary to filter_for_cake again (recursion)
for the remaining values, it just leaves them in as is

How to find a value of a key in a nested dictionary with lists?

Since you do not know how deep inside the value is, it is prob advisable to use a recursive function to iterate through all the layers till it's found. I used DFS below.

def search(ld, find):
    if(type(ld)==list):
        for i in ld:
            if(type(i)==list or type(i)==dict):
                result=search(i, find)
                if(result!=None): return result
    elif(type(ld)==dict):
        try:
            return ld[find]
        except(KeyError):
            for i in ld:
                if(type(ld[i])==list or type(ld[i])):
                    result=search(ld[i], find)
                    if(result!=None): return result
    else:
        return None

test_dict1 = {
    "blah":"blah",
    "alerts": [{"test1":"1", "test":"2"}],
    "foo": {
        "foo":"bar",
        "foo1": [{"test3":"3"}]
        }}

print(search(test_dict1, "test3"))

Find path of occurrences of a key, value pair in nested dictionaries and lists

You should use yield to create an iterator. It would make the code simpler and more efficient (Especially if you're not going to always go through all occurrences). To only find the first one, you can use the next function.

def findKeys(d,key,value):
    if key in d and d[key] == value: yield [d["name"]]
    subLevels = ( (a,v) for a,vl in d.items() if isinstance(vl,list) for v in vl )
    for attrib,subDict in subLevels:
        if not isinstance(subDict,dict):continue
        for path in findKeys(subDict,key,value):
            yield [d["name"]]+path

output:

for path in findKeys(d,"type","A7240XM"):
    print(path)

['/', 'md', 'level0', 'level1', 'level2', 'something1']
['/', 'md', 'level0', 'level1', 'level2', 'something2']
['/', 'md', 'level0', 'level1', 'ng', 'someother1']
['/', 'md', 'level0', 'level1', 'ng', 'findME']
['/', 'md', 'level0', 'level1', 'be', 'some123']
['/', 'md', 'level0', 'level1', 'be', 'some321']

]

next(findKeys(d,"name","some123"))

['/', 'md', 'level0', 'level1', 'be', 'some123']

Extract fields in a dynamically nested dictionary in an ordered manner

def dict_value(val: dict):
    for key, item in val.items():
        if type(item) == dict:
            dict_value(item)
        else:
            print(item)

dict_value(data)

How to find all instances of a substring inside a nested dict that could contain more lists or lists of dicts

Lists are mutable. Easiest is to replace by index. you can get this with enumerate

def dict_extract(self, search_str: str, d: dict) -> None:
        ...
                   for indx, item in enumerate(v):
                        if isinstance(item, dict):
                            self.dict_extract(search_str, item)
                        if isinstance(item, str):
                            v[indx] = <your new value>

You might have an easier time manipulating the dictionary as a string instead of as a Python structure?

The function below illustrates a straightforward replace. You could extend it with a regex that replaces everything between quotes " instead of just old.

import json
def replace_terms_in_dict(old, new, _dict):
    string = json.dumps(_dict)
    string = string.replace(old, new)
    _dict = json.loads(string)
    return _dict