Yaml Indentation for Array in Hash

YAML indentation for array in hash

Both ways are valid, as far as I can tell:

require 'yaml'

YAML.load(%q{---
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

YAML.load(%q{---
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

It's not clear why you think there should be spaces before the hyphens. If you think this is a violation of the spec, please explain how.

Why isn't there indentation for the array?

There's no need for indentation before the hyphens, and it's simpler not to add any.

YAML read should be an Hash not an Array, what's wrong?

The error is occuring in the OpenStruct initializer.

When you call Hash#each and give it a block with arity = 1, the block gets an array like [key, value]. Then you're passing that to OpenStruct.new, which results in an error since you can't initialize an OpenStruct with an Array.

What I think you want is:

listing.each do |key, value|
items << OpenStruct.new({ key: value })
end

alternatively, the yaml file could be

- item1: label1
- item2: label2

and I believe the code would work as is.

The YAML file you have is deserialized to:

{ item1: "label1", item2: "label2" }

whereas the one I've described would be:

[{ item1: "label1" }, { item2: "label2" }]

YAML indentation for array in hash

Both ways are valid, as far as I can tell:

require 'yaml'

YAML.load(%q{---
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

YAML.load(%q{---
1:
- 1
- 2
- 3
})
# => {1=>[1, 2, 3]}

It's not clear why you think there should be spaces before the hyphens. If you think this is a violation of the spec, please explain how.

Why isn't there indentation for the array?

There's no need for indentation before the hyphens, and it's simpler not to add any.

Is it possible to specify formatting options for to_yaml in ruby?

This ugly hack seems to do the trick...

class Array
def to_yaml_style
:inline
end
end

Browsing through ruby's source, I can't find any options I could pass to achieve the same. Default options are described in the lib/yaml/constants.rb.

How to parse YAML data into a custom Bash data array/hash structure?

I have decided to use a combination of the following:

  • a hacked version of Yay:

    • with added support for simple lists
    • fixes for multiple indentation levels
  • a hacked version of this yaml parser:

    • with prefix stuff borrowed from Yay, for consistency
function yaml_to_vars {
# find input file
for f in "$1" "$1.yay" "$1.yml"
do
[[ -f "$f" ]] && input="$f" && break
done
[[ -z "$input" ]] && exit 1

# use given dataset prefix or imply from file name
[[ -n "$2" ]] && local prefix="$2" || {
local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix="${prefix//-/_}_";
}

local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
sed -ne "s|,$s\]$s\$|]|" \
-e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1 - \4|;t1" \
-e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1 - \3|;p" $1 | \
sed -ne "s|,$s}$s\$|}|" \
-e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1 \3: \4|;t1" \
-e "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1 \2|;p" | \
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
awk -F$fs '{
indent = length($1)/2;
vname[indent] = $2;
for (i in vname) {if (i > indent) {delete vname[i]; idx[i]=0}}
if(length($2)== 0){ vname[indent]= ++idx[indent] };
if (length($3) > 0) {
vn=""; for (i=0; i<indent; i++) { vn=(vn)(vname[i])("_")}
printf("%s%s%s=\"%s\"\n", "'$prefix'",vn, vname[indent], $3);
}
}'
}

yay_parse() {

# find input file
for f in "$1" "$1.yay" "$1.yml"
do
[[ -f "$f" ]] && input="$f" && break
done
[[ -z "$input" ]] && exit 1

# use given dataset prefix or imply from file name
[[ -n "$2" ]] && local prefix="$2" || {
local prefix=$(basename "$input"); prefix=${prefix%.*}; prefix=${prefix//-/_};
}

echo "unset $prefix; declare -g -a $prefix;"

local s='[[:space:]]*' w='[a-zA-Z0-9_]*' fs=$(echo @|tr @ '\034')
#sed -n -e "s|^\($s\)\($w\)$s:$s\"\(.*\)\"$s\$|\1$fs\2$fs\3|p" \
# -e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" "$input" |
sed -ne "s|,$s\]$s\$|]|" \
-e ":1;s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s,$s\(.*\)$s\]|\1\2: [\3]\n\1 - \4|;t1" \
-e "s|^\($s\)\($w\)$s:$s\[$s\(.*\)$s\]|\1\2:\n\1 - \3|;p" $1 | \
sed -ne "s|,$s}$s\$|}|" \
-e ":1;s|^\($s\)-$s{$s\(.*\)$s,$s\($w\)$s:$s\(.*\)$s}|\1- {\2}\n\1 \3: \4|;t1" \
-e "s|^\($s\)-$s{$s\(.*\)$s}|\1-\n\1 \2|;p" | \
sed -ne "s|^\($s\):|\1|" \
-e "s|^\($s\)-$s[\"']\(.*\)[\"']$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)-$s\(.*\)$s\$|\1$fs$fs\2|p" \
-e "s|^\($s\)\($w\)$s:$s[\"']\(.*\)[\"']$s\$|\1$fs\2$fs\3|p" \
-e "s|^\($s\)\($w\)$s:$s\(.*\)$s\$|\1$fs\2$fs\3|p" | \
awk -F$fs '{
indent = length($1)/2;
key = $2;
value = $3;

# No prefix or parent for the top level (indent zero)
root_prefix = "'$prefix'_";
if (indent == 0) {
prefix = ""; parent_key = "'$prefix'";
} else {
prefix = root_prefix; parent_key = keys[indent-1];
}

keys[indent] = key;

# remove keys left behind if prior row was indented more than this row
for (i in keys) {if (i > indent) {delete keys[i]}}

# if we have a value
if (length(value) > 0) {

# set values here

# if the "key" is missing, make array indexed, not assoc..

if (length(key) == 0) {
# array item has no key, only a value..
# so, if we didnt already unset the assoc array
if (unsetArray == 0) {
# unset the assoc array here
printf("unset %s%s; ", prefix, parent_key);
# switch the flag, so we only unset once, before adding values
unsetArray = 1;
}
# array was unset, has no key, so add item using indexed array syntax
printf("%s%s+=(\"%s\");\n", prefix, parent_key, value);

} else {
# array item has key and value, add item using assoc array syntax
printf("%s%s[%s]=\"%s\";\n", prefix, parent_key, key, value);
}

} else {

# declare arrays here

# reset this flag for each new array we work on...
unsetArray = 0;

# if item has no key, declare indexed array
if (length(key) == 0) {
# indexed
printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

# if item has numeric key, declare indexed array
} else if (key ~ /^[[:digit:]]/) {
printf("unset %s%s; declare -g -a %s%s;\n", root_prefix, key, root_prefix, key);

# else (item has a string for a key), declare associative array
} else {
printf("unset %s%s; declare -g -A %s%s;\n", root_prefix, key, root_prefix, key);
}

# set root level values here

if (indent > 0) {
# add to associative array
printf("%s%s[%s]+=\"%s%s\";\n", prefix, parent_key , key, root_prefix, key);
} else {
# add to indexed array
printf("%s%s+=( \"%s%s\");\n", prefix, parent_key , root_prefix, key);
}

}
}'
}

# helper to load yay data file
yay() {
# yaml_to_vars "$@" ## uncomment to debug (prints data to stdout)
eval $(yaml_to_vars "$@")

# yay_parse "$@" ## uncomment to debug (prints data to stdout)
eval $(yay_parse "$@")
}

Using the code above, when products.yml contains:

  product1
name: Foo
price: 100
product2
name: Bar
price: 200

the parser can be called like so:

source path/to/yml-parser.sh
yay products.yml

And it generates and then evaluates this code:

products_product1_name="Foo"
products_product1_price="100"
products_product2_name="Bar"
products_product2_price="200"
unset products;
declare -g -a products;
unset products_product1;
declare -g -A products_product1;
products+=( "products_product1");
products_product1[name]="Foo";
products_product1[price]="100";
unset products_product2;
declare -g -A products_product2;
products+=( "products_product2");
products_product2[name]="Bar";
products_product2[price]="200";

So, I get the following Bash arrays and variables:

declare -a products=([0]="products_product1" [1]="products_product2")
declare -A products_product1=([price]="100" [name]="Foo" )
declare -A products_product2=([price]="200" [name]="Bar" )

And in my templating system, I can now access this yml data like so:

{{#foreach product in products}}
Name: {{product.name}}
Price: {{product.price}}
{{/foreach}}

:)

Another example:

File site.yml

meta_info:
title: My cool blog
domain: foo.github.io
author1:
name: bob
url: /author/bob
author2:
name: jane
url: /author/jane
header_links:
link1:
title: About
url: about.html
link2:
title: Contact Us
url: contactus.html
js_deps:
cashjs: cashjs
jets: jets
Foo:
- one
- two
- three

Produces:

declare -a site=([0]="site_meta_info" [1]="site_author1" [2]="site_author2" [3]="site_header_links" [4]="site_js_deps" [5]="site_Foo")
declare -A site_meta_info=([title]="My cool blog" [domain]="foo.github.io" )
declare -A site_author1=([url]="/author/bob" [name]="bob" )
declare -A site_author2=([url]="/author/jane" [name]="jane" )
declare -A site_header_links=([link1]="site_link1" [link2]="site_link2" )
declare -A site_link1=([url]="about.html" [title]="About" )
declare -A site_link2=([url]="contactus.html" [title]="Contact Us" )
declare -A site_js_deps=([cashjs]="cashjs" [jets]="jets" )
declare -a site_Foo=([0]="one" [1]="two" [2]="three")

In my templates, I can access site_header_links like so:

{{#foreach link in site_header_links}}
* {{link.title}} - {{link.url}}
{{/foreach}}

and site_Foo (a dash-notation, or simple list) like so:

{{#site_Foo}}
* {{.}}
{{/site_Foo}}

Extending Hash and (de)serializing from/to yaml

As Mladen Jablanović's answer shows, you can override to_yaml. You could add an array named 'attributes' (taking special care to escape that name if there is a key in the hash with that name (taking care to escape the escaped name if ... etc.)). However, you need some knowledge of the internals to make this work (the out.map(tag_uri, to_yaml_style) and its variations are nontrivial and not well documented: the sources of the various Ruby interpreters are your best bet).

Unfortunately, you also need to override the deserialization process. How you can reuse existing code there is close to completely undocumented. As in this answer, you see you would need to add a to_yaml_type and add the deserialization code using YAML::add_domain_type. From there, you are pretty much on your own: you need to write half a YAML parser to parse the yamled string and convert it into your object.

It's possible to figure it out, but the easier solution, that I implemented last time I wanted this, was to just make the Hash an attribute of my object, instead of extending Hash. And later I realized I wasn't actually implementing a subclass of Hash anyway. That something is storing key-value pairs doesn't necessarily mean it is a Hash. If you implement :[], :[]= and each, you usually get a long way towards being able to treat an object as if it is a Hash.

Ruby hash based on indentation

Yeah, you can either implement a method to parse hashes from strings using indentation as delimiters, or, as @AJcodez suggested:

require 'psych'
require 'yaml'

yash = <<EOT # type hashes like this
---
:a:
- 1
- :b:
:c: 3
:d: 4
:e:
:f: qwe
EOT

hash = YAML.load yash
=> {:a=>[1, {:b=>{:c=>3, :d=>4}, :e=>{:f=>"qwe"}}]}


Related Topics



Leave a reply



Submit