Parse a Yaml File

How can I parse a YAML file in Python

The easiest and purest method without relying on C headers is PyYaml (documentation), which can be installed via pip install pyyaml:

#!/usr/bin/env python

import yaml

with open("example.yaml", "r") as stream:
try:
print(yaml.safe_load(stream))
except yaml.YAMLError as exc:
print(exc)

And that's it. A plain yaml.load() function also exists, but yaml.safe_load() should always be preferred to avoid introducing the possibility for arbitrary code execution. So unless you explicitly need the arbitrary object serialization/deserialization use safe_load.

Note the PyYaml project supports versions up through the YAML 1.1 specification. If YAML 1.2 specification support is needed, see ruamel.yaml as noted in this answer.

Also, you could also use a drop in replacement for pyyaml, that keeps your yaml file ordered the same way you had it, called oyaml. View synk of oyaml here

Parsing yaml file with --- in python

Your input is composed of multiple YAML documents. For that you will need yaml.load_all() or better yet yaml.safe_load_all(). (The latter will not construct arbitrary Python objects outside of data-like structures such as list/dict.)

import yaml

with open('temp.yaml') as f:
temp = yaml.safe_load_all(f)

As hinted at by the error message, yaml.load() is strict about accepting only a single YAML document.

Note that safe_load_all() returns a generator of Python objects which you'll need to iterate over.

>>> gen = yaml.safe_load_all(f)
>>> next(gen)
{'name': 'first', 'cmp': [{'Some': 'first', 'top': {'top_rate': 16000, 'audio_device': 'pulse'}}]}
>>> next(gen)
{'name': 'second', 'components': [{'name': 'second', 'parameters': {'always_on': True, 'timeout': 200000}}]}

How to parse YAML file correctly?

You must specify a constructor for the OpenCV data type that you are trying to load, because it doesn't exist by default in PyYAML:

import yaml

def meta_constructor(loader, node):
return loader.construct_mapping(node)

yaml.add_constructor(u'tag:yaml.org,2002:opencv-matrix', meta_constructor)

with open(file_name, 'r') as stream:
data_loaded = yaml.load(stream, Loader=yaml.Loader)

print(data_loaded)

Output:

{'flow': {'rows': 256, 'cols': 256, 'dt': '2f', 'data': [0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0, '...']}}

How can I parse a YAML file from a Linux shell script?

My use case may or may not be quite the same as what this original post was asking, but it's definitely similar.

I need to pull in some YAML as bash variables. The YAML will never be more than one level deep.

YAML looks like so:

KEY:                value
ANOTHER_KEY: another_value
OH_MY_SO_MANY_KEYS: yet_another_value
LAST_KEY: last_value

Output like-a dis:

KEY="value"
ANOTHER_KEY="another_value"
OH_MY_SO_MANY_KEYS="yet_another_value"
LAST_KEY="last_value"

I achieved the output with this line:

sed -e 's/:[^:\/\/]/="/g;s/$/"/g;s/ *=/=/g' file.yaml > file.sh
  • s/:[^:\/\/]/="/g finds : and replaces it with =", while ignoring :// (for URLs)
  • s/$/"/g appends " to the end of each line
  • s/ *=/=/g removes all spaces before =

How to parse yaml file with string values

Is the quote part of the data, or just its representation? If it's part of the data, you'll have to indicate that in the yaml.

data:
key1: |
"Value1"
key2: Value2

Note that enclosing quotes is optional on yaml string values, which means they must be explicitly included if they are to be part of the string data itself.

# these two documents are identical
data:
- this
- that
- the other
---
data:
- "this"
- "that"
- "the other"

Unable to parse yaml file into python

The problem here is with your YAML file I believe, it should've been:

name: nick # YAML allows comments
things:
- chair
- table
- sofa:
color: gray
age: 2

YAML depends a lot on indentation so keep that in mind.

Parsing yaml file format in python

data is a list (its elements specified by - in YAML). A list containing the dictionaries you seem to be interested in are thus in data[5] — you can see it is another list by another level of - items. Specifically, data[5][0] is a dictionary (specified by <key>: items in YAML):

{'Buffer': 0, 'AggressivePerfMode': 1, 'AssertFree0ElementMultiple': 1, 'AssertFree1ElementMultiple': 1}

and data[5][0]["Buffer"] is 0.

How do I parse a YAML file in Ruby?

Maybe I'm missing something, but why try to parse the file? Why not just load the YAML and examine the object(s) that result?

If your sample YAML is in some.yml, then this:

require 'yaml'
thing = YAML.load_file('some.yml')
puts thing.inspect

gives me

{"javascripts"=>[{"fo_global"=>["lazyload-min", "holla-min"]}]}

Parsing Yaml file

Both your schema and your yaml were wrong. Main reasons:

  • You should have nested structs, not Vec.
  • Your yaml types were not accurate, for example True is string, true is bool. 8000 is not String, "8000" is.
use std::fs::File;
use serde_yaml; // 0.8.23
use serde::{Serialize, Deserialize};

#[derive(Debug, Serialize, Deserialize)]
struct ColorStruct {
fatal: String,
error: String,
warn: String,
info: String,
debug: String,
trace: String
}

#[derive(Debug, Serialize, Deserialize)]
struct LoggingStruct {
use_color: bool,
log_color: ColorStruct,
log_output: String,
file_location: String
}

#[derive(Debug, Serialize, Deserialize)]
struct RocketStruct {
mount_location: String,
port: String
}

#[derive(Debug, Serialize, Deserialize)]
struct Config {
default_verbosity: i32,
logging: LoggingStruct,
rocket: RocketStruct
}

fn main(){
let yamlFile = r#"default_verbosity: 0
logging:
use_color: true
log_color:
fatal: "Red"
error: "Red"
warn: "Red"
info: "Green"
debug: "Blue"
trace: "Yellow"
log_output: "file"
file_location: "example.log"
rocket:
mount_location: "/"
port: "8000""#;
let myYaml: Config = serde_yaml::from_str(yamlFile).unwrap();
}

Playground

If you really want to use Vec as part of your original schema, you would need some changes:

  • Probably ColorStruct should be an enum, but if not you just need to keep as the remaining examples.
  • Your yaml need to provide the data correcly too to match those types.

#[derive(Debug, Serialize, Deserialize)]
enum ColorStruct {
fatal(String),
error(String),
warn(String),
info(String),
debug(String),
trace(String),
}

...

let yamlFile = r#"default_verbosity: 0
logging: [
{
log_output: "file",
file_location: "example.log",
use_color: true,
log_color: [
{ fatal: "Red" },
{ error: "Red" },
{ warn: "Red" },
{ info: "Green" },
{ debug: "Blue" },
{ trace: "Yellow" }
]
}
]

rocket: [
{
mount_location: "/",
port: "8000"
}
]"#;

...

Playground



Related Topics



Leave a reply



Submit