How to Parse a Yaml File in Python

How can I parse a YAML file in Python

The easiest and purest method without relying on C headers is PyYaml (documentation), which can be installed via pip install pyyaml:

#!/usr/bin/env python

import yaml

with open("example.yaml", "r") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)

And that's it. A plain yaml.load() function also exists, but yaml.safe_load() should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization use safe_load.

Note the PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.

Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View synk of oyaml here

Parsing yaml file with --- in python

Your input is composed of multiple YAML documents. For that you will need yaml.load_all() or better yet yaml.safe_load_all(). (The latter will not construct arbitrary Python objects outside of data-like structures such as list/dict.)

import yaml

with open('temp.yaml') as f:
temp = yaml.safe_load_all(f)

As hinted at by the error message, yaml.load() is strict about accepting only a single YAML document.

Note that safe_load_all() returns a generator of Python objects which you'll need to iterate over.

>>> gen = yaml.safe_load_all(f)
>>> next(gen)
{'name': 'first', 'cmp': [{'Some': 'first', 'top': {'top_rate': 16000, 'audio_device': 'pulse'}}]}
>>> next(gen)
{'name': 'second', 'components': [{'name': 'second', 'parameters': {'always_on': True, 'timeout': 200000}}]}

How to parse yaml file with string values

Is the quote part of the data, or just its representation? If it's part of the data, you'll have to indicate that in the yaml.

data:
key1: |
"Value1"
key2: Value2

Note that enclosing quotes is optional on yaml string values, which means they must be explicitly included if they are to be part of the string data itself.

# these two documents are identical
data:
- this
- that
- the other
---
data:
- "this"
- "that"
- "the other"

How to parse YAML file correctly?

You must specify a constructor for the OpenCV data type that you are trying to load, because it doesn't exist by default in PyYAML:

import yaml

def meta_constructor(loader, node):
return loader.construct_mapping(node)

yaml.add_constructor(u'tag:yaml.org,2002:opencv-matrix', meta_constructor)

with open(file_name, 'r') as stream:
data_loaded = yaml.load(stream, Loader=yaml.Loader)

print(data_loaded)

Output:

{'flow': {'rows': 256, 'cols': 256, 'dt': '2f', 'data': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '...']}}

Parsing a YAML file in Python, and accessing the data?

Since PyYAML's yaml.load() function parses YAML documents to native Python data structures, you can just access items by key or index. Using the example from the question you linked:

import yaml
with open('tree.yaml', 'r') as f:
doc = yaml.load(f)

To access branch1 text you would use:

txt = doc["treeroot"]["branch1"]
print txt
"branch1 text"

because, in your YAML document, the value of the branch1 key is under the treeroot key.

How to parse/read a YAML file into a Python object?

If your YAML file looks like this:

# tree format
treeroot:
branch1:
name: Node 1
branch1-1:
name: Node 1-1
branch2:
name: Node 2
branch2-1:
name: Node 2-1

And you've installed PyYAML like this:

pip install PyYAML

And the Python code looks like this:

import yaml
with open('tree.yaml') as f:
# use safe_load instead load
dataMap = yaml.safe_load(f)

The variable dataMap now contains a dictionary with the tree data. If you print dataMap using PrettyPrint, you will get something like:

{
'treeroot': {
'branch1': {
'branch1-1': {
'name': 'Node 1-1'
},
'name': 'Node 1'
},
'branch2': {
'branch2-1': {
'name': 'Node 2-1'
},
'name': 'Node 2'
}
}
}

So, now we have seen how to get data into our Python program. Saving data is just as easy:

with open('newtree.yaml', "w") as f:
yaml.dump(dataMap, f)

You have a dictionary, and now you have to convert it to a Python object:

class Struct:
def __init__(self, **entries):
self.__dict__.update(entries)

Then you can use:

>>> args = your YAML dictionary
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s...

and follow "Convert Python dict to object".

For more information you can look at pyyaml.org and this.

Parsing yaml file format in python

data is a list (its elements specified by - in YAML). A list containing the dictionaries you seem to be interested in are thus in data[5] — you can see it is another list by another level of - items. Specifically, data[5][0] is a dictionary (specified by <key>: items in YAML):

{'Buffer': 0, 'AggressivePerfMode': 1, 'AssertFree0ElementMultiple': 1, 'AssertFree1ElementMultiple': 1}

and data[5][0]["Buffer"] is 0.

How can I parse YAML with TAGs?

If you just need to inspect the tags and , the corresponding loaded
dict and list subclasses preserve
their tag in the .tag attribute (this might change so pin the version of ruamel.yaml you use):

import sys
import ruamel.yaml

yaml_str = """\
steps:
- !<!entry>
id: Entry-1
actions: []
- !<!replybuttons>
id: ReplyButtons-langcheck
footer: ''
- !<!input>
id: Input-langcheck
var: Input-1
- !<!logic>
id: LangCheck-Logic
entries:
- condition: !<!equals>
var: Input-langcheck
isCaseSensitive: false
"""

yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print('id', data['steps'][1]['id'])
print('tag', data['steps'][1].tag.value)

which gives:

id ReplyButtons-langcheck
tag !replybuttons

That your first attempt didn't work lies in the fact that your tags are special because of the <>, these
are verbatim tags, in this case necessary
to allow a tag starting with an exclamation mark. So when the YAML contains !<abc>
you register !abc with add_constructor (and I think you can leave out the !) and when your YAML contains !<!abc> you register !abc.
The parser strips the <> for these verbatim tags, that is why that printed tag
doesn't contain them after loading.

Writing this I noticed that the round-trip parser doesn't check if a tag needs
to be written verbatim. So if you dump the loaded data, you get non-verbatim tags,
which don't load the same way. So
if you need to update these files, then you should to get the classes registered (let me know
if that doesn't work out).
Recursively walking over the data structure and rewrite the tags to compensate for this bug
will not work as the <> gets escaped while dumping.

Unable to parse yaml file into python

The problem here is with your YAML file I believe, it should've been:

name: nick # YAML allows comments
things:
- chair
- table
- sofa:
color: gray
age: 2

YAML depends a lot on indentation so keep that in mind.

Parse a yaml file in a for loop and create new yaml files - non-specific tag - Python

If you want to process a YAML file's structure and do not care about the content, you do not need to construct native Python objects. Instead, use the node graph:

import yaml, sys
from yaml.nodes import SequenceNode, MappingNode
from yaml.resolver import BaseResolver

input = """
elements:
- !Element
name: element1
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1

- !Element
name: element2
gnc_script:
- '*fcc_cores_1'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1

- !Element
name: element3
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1
"""

node = yaml.compose(input)

visited = set()

def output(node):
yaml.serialize(node, sys.stdout)
sys.stdout.write("...\n")

def visit(node):
if node in visited: return
if node.tag == "!Element":
node.tag = BaseResolver.DEFAULT_MAPPING_TAG
output(node)
visited.add(node)
if isinstance(node, SequenceNode):
for child in node.value:
visit(child)
elif isinstance(node, MappingNode):
for k,v in node.value:
visit(k); visit(v)

visit(node)

This outputs:

name: element1
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...
name: element2
gnc_script:
- '*fcc_cores_1'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...
name: element3
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...

This code includes its input and outputs to stdout for demonstration purposes; just rewrite output(node) to create the files you want and replace input with the file you want to process. For example, this would write them in separate files; requires an existing scalar value for the key name in the element:

def output(node):
name = next(x[1].value for x in node.value if x[0].value == "name")
with open("{0}.yaml".format(name), "w") as f:
yaml.serialize(node, f)

As you can see, comments in the input are not part of the output. This is because they are thrown away by PyYAML's parser. There's no easy way to fix this with PyYAML; you could try ruamel which tries to preserve comments but I don't know its API.



Related Topics



Leave a reply



Submit