How can I parse a YAML file in Python
The easiest and purest method without relying on C headers is PyYaml (documentation), which can be installed via pip install pyyaml
:
#!/usr/bin/env python
import yaml
with open("example.yaml", "r") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)
And that's it. A plain yaml.load()
function also exists, but yaml.safe_load()
should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization use safe_load
.
Note the PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.
Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View synk of oyaml here
Parsing yaml file with --- in python
Your input is composed of multiple YAML documents. For that you will need yaml.load_all()
or better yet yaml.safe_load_all()
. (The latter will not construct arbitrary Python objects outside of data-like structures such as list/dict.)
import yaml
with open('temp.yaml') as f:
temp = yaml.safe_load_all(f)
As hinted at by the error message, yaml.load()
is strict about accepting only a single YAML document.
Note that safe_load_all()
returns a generator of Python objects which you'll need to iterate over.
>>> gen = yaml.safe_load_all(f)
>>> next(gen)
{'name': 'first', 'cmp': [{'Some': 'first', 'top': {'top_rate': 16000, 'audio_device': 'pulse'}}]}
>>> next(gen)
{'name': 'second', 'components': [{'name': 'second', 'parameters': {'always_on': True, 'timeout': 200000}}]}
How to parse yaml file with string values
Is the quote part of the data, or just its representation? If it's part of the data, you'll have to indicate that in the yaml.
data:
key1: |
"Value1"
key2: Value2
Note that enclosing quotes is optional on yaml string values, which means they must be explicitly included if they are to be part of the string data itself.
# these two documents are identical
data:
- this
- that
- the other
---
data:
- "this"
- "that"
- "the other"
How to parse YAML file correctly?
You must specify a constructor for the OpenCV data type that you are trying to load, because it doesn't exist by default in PyYAML
:
import yaml
def meta_constructor(loader, node):
return loader.construct_mapping(node)
yaml.add_constructor(u'tag:yaml.org,2002:opencv-matrix', meta_constructor)
with open(file_name, 'r') as stream:
data_loaded = yaml.load(stream, Loader=yaml.Loader)
print(data_loaded)
Output:
{'flow': {'rows': 256, 'cols': 256, 'dt': '2f', 'data': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '...']}}
Parsing a YAML file in Python, and accessing the data?
Since PyYAML's yaml.load()
function parses YAML documents to native Python data structures, you can just access items by key or index. Using the example from the question you linked:
import yaml
with open('tree.yaml', 'r') as f:
doc = yaml.load(f)
To access branch1 text
you would use:
txt = doc["treeroot"]["branch1"]
print txt
"branch1 text"
because, in your YAML document, the value of the branch1
key is under the treeroot
key.
How to parse/read a YAML file into a Python object?
If your YAML file looks like this:
# tree format
treeroot:
branch1:
name: Node 1
branch1-1:
name: Node 1-1
branch2:
name: Node 2
branch2-1:
name: Node 2-1
And you've installed PyYAML
like this:
pip install PyYAML
And the Python code looks like this:
import yaml
with open('tree.yaml') as f:
# use safe_load instead load
dataMap = yaml.safe_load(f)
The variable dataMap
now contains a dictionary with the tree data. If you print dataMap
using PrettyPrint, you will get something like:
{
'treeroot': {
'branch1': {
'branch1-1': {
'name': 'Node 1-1'
},
'name': 'Node 1'
},
'branch2': {
'branch2-1': {
'name': 'Node 2-1'
},
'name': 'Node 2'
}
}
}
So, now we have seen how to get data into our Python program. Saving data is just as easy:
with open('newtree.yaml', "w") as f:
yaml.dump(dataMap, f)
You have a dictionary, and now you have to convert it to a Python object:
class Struct:
def __init__(self, **entries):
self.__dict__.update(entries)
Then you can use:
>>> args = your YAML dictionary
>>> s = Struct(**args)
>>> s
<__main__.Struct instance at 0x01D6A738>
>>> s...
and follow "Convert Python dict to object".
For more information you can look at pyyaml.org and this.
Parsing yaml file format in python
data
is a list (its elements specified by -
in YAML). A list containing the dictionaries you seem to be interested in are thus in data[5]
— you can see it is another list by another level of -
items. Specifically, data[5][0]
is a dictionary (specified by <key>:
items in YAML):
{'Buffer': 0, 'AggressivePerfMode': 1, 'AssertFree0ElementMultiple': 1, 'AssertFree1ElementMultiple': 1}
and data[5][0]["Buffer"]
is 0
.
How can I parse YAML with TAGs?
If you just need to inspect the tags and , the corresponding loaded
dict and list subclasses preserve
their tag in the .tag
attribute (this might change so pin the version of ruamel.yaml you use):
import sys
import ruamel.yaml
yaml_str = """\
steps:
- !<!entry>
id: Entry-1
actions: []
- !<!replybuttons>
id: ReplyButtons-langcheck
footer: ''
- !<!input>
id: Input-langcheck
var: Input-1
- !<!logic>
id: LangCheck-Logic
entries:
- condition: !<!equals>
var: Input-langcheck
isCaseSensitive: false
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
print('id', data['steps'][1]['id'])
print('tag', data['steps'][1].tag.value)
which gives:
id ReplyButtons-langcheck
tag !replybuttons
That your first attempt didn't work lies in the fact that your tags are special because of the <>
, these
are verbatim tags, in this case necessary
to allow a tag starting with an exclamation mark. So when the YAML contains !<abc>
you register !abc
with add_constructor (and I think you can leave out the !) and when your YAML contains !<!abc>
you register !abc
.
The parser strips the <>
for these verbatim tags, that is why that printed tag
doesn't contain them after loading.
Writing this I noticed that the round-trip parser doesn't check if a tag needs
to be written verbatim. So if you dump the loaded data, you get non-verbatim tags,
which don't load the same way. So
if you need to update these files, then you should to get the classes registered (let me know
if that doesn't work out).
Recursively walking over the data structure and rewrite the tags to compensate for this bug
will not work as the <>
gets escaped while dumping.
Unable to parse yaml file into python
The problem here is with your YAML file I believe, it should've been:
name: nick # YAML allows comments
things:
- chair
- table
- sofa:
color: gray
age: 2
YAML depends a lot on indentation so keep that in mind.
Parse a yaml file in a for loop and create new yaml files - non-specific tag - Python
If you want to process a YAML file's structure and do not care about the content, you do not need to construct native Python objects. Instead, use the node graph:
import yaml, sys
from yaml.nodes import SequenceNode, MappingNode
from yaml.resolver import BaseResolver
input = """
elements:
- !Element
name: element1
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1
- !Element
name: element2
gnc_script:
- '*fcc_cores_1'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1
- !Element
name: element3
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
# indirect related on
- projectxyz1
"""
node = yaml.compose(input)
visited = set()
def output(node):
yaml.serialize(node, sys.stdout)
sys.stdout.write("...\n")
def visit(node):
if node in visited: return
if node.tag == "!Element":
node.tag = BaseResolver.DEFAULT_MAPPING_TAG
output(node)
visited.add(node)
if isinstance(node, SequenceNode):
for child in node.value:
visit(child)
elif isinstance(node, MappingNode):
for k,v in node.value:
visit(k); visit(v)
visit(node)
This outputs:
name: element1
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...
name: element2
gnc_script:
- '*fcc_cores_1'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...
name: element3
gnc_script:
- '*fcc_cores_1'
- '*setup'
relationship:
- projectx
- projectxy
- projectxyt
- projectxyz1
...
This code includes its input and outputs to stdout for demonstration purposes; just rewrite output(node)
to create the files you want and replace input with the file you want to process. For example, this would write them in separate files; requires an existing scalar value for the key name
in the element:
def output(node):
name = next(x[1].value for x in node.value if x[0].value == "name")
with open("{0}.yaml".format(name), "w") as f:
yaml.serialize(node, f)
As you can see, comments in the input are not part of the output. This is because they are thrown away by PyYAML's parser. There's no easy way to fix this with PyYAML; you could try ruamel which tries to preserve comments but I don't know its API.
Related Topics
How to Get an Absolute File Path in Python
Best Way to Replace Multiple Characters in a String
How to Print the Value of a Tensor Object in Tensorflow
Creating a Dictionary from a CSV File
How to Limit Execution Time of a Function Call
Create Pandas Dataframe from a String
How to Return Two Values from a Function in Python
Check If a Word Is in a String in Python
Permutations Between Two Lists of Unequal Length
How to Keep a Python Script Output Window Open
Tensorflow Not Found Using Pip
Import Error: No Module Name Urllib2
Run Certain Code Every N Seconds
Else Clause on Python While Statement
Pycharm Shows Unresolved References Error for Valid Code
Matplotlib Scatterplot; Color as a Function of a Third Variable
Create Own Colormap Using Matplotlib and Plot Color Scale
Lxml Error "Ioerror: Error Reading File" When Parsing Facebook Mobile in a Python Scraper Script