How to Emit Comments in a Yaml Document Using Psych

Adding comment to YAML programmatically

require 'yaml'

str = <<-eol
root:
label: 'Test'
account: 'Account'
add: 'Add'
local_folder: 'Local folder'
remote_folder: 'Remote folder'
status: 'Status'
subkey: 'Some value'
eol

h = YAML.load(str)
h["root"]["local_folder"] = h["root"]["local_folder"] + " !Test comment"
h["root"]["subkey"] = h["root"]["subkey"] + " !Test comment"

puts h.to_yaml

# >> ---
# >> root:
# >> label: Test
# >> account: Account
# >> add: Add
# >> local_folder: Local folder !Test comment
# >> remote_folder: Remote folder
# >> status: Status
# >> subkey: Some value !Test comment

EDIT

more programmatically:

require 'yaml'

str = <<-eol
root:
label: 'Test'
account: 'Account'
add: 'Add'
local_folder: 'Local folder'
remote_folder: 'Remote folder'
status: 'Status'
subkey: 'Some value'
eol

h = YAML.load(str)
%w(local_folder subkey).each {|i| h["root"][i] = h["root"][i] + " !Test comment" }

puts h.to_yaml

# >> ---
# >> root:
# >> label: Test
# >> account: Account
# >> add: Add
# >> local_folder: Local folder !Test comment
# >> remote_folder: Remote folder
# >> status: Status
# >> subkey: Some value !Test comment

Can Ruby's YAML module be used to embed comments?

You can do a string replace on all the insertions:

require 'yaml'

source = {
:client => 'host4.example.com',
:server => '192.168.222.222',
}.to_yaml

substitution_list = {
/:client:/ => "# hostname or IP address of client\n:client:",
/:server:/ => "# hostname or IP address of server\n:server:"
}

substitution_list.each do |pattern, replacement|
source.gsub!(pattern, replacement)
end

puts source

output:

--- 
# hostname or IP address of client
:client: host4.example.com
# hostname or IP address of server
:server: 192.168.222.222

Use of --- in yaml

In YAML, --- is the end of directives marker.

A YAML document may begin with a number of YAML directives (currently, two directives are defined, %YAML and %TAG). Since a text node (for example) can also start with a % character, there needs to be a way to distinguish between directives and text. This is achieved using the end of directives marker --- which signals the end of the directives and the beginning of the document.

Since directives are allowed to be empty, --- can also serve as a document separator.

YAML also has an end of document marker .... However, this is not often used, because and end of directives marker / document separator also implies the end of the document. You need it if you want to have multiple documents with directives within the same stream or when you want to indicate that a document is finished without necessarily starting a new one (e.g. in cases where there may be significant time passing between the end of one document and the start of another).

Many YAML emitters, and Psych is no exception, always emit an end of directives marker at the beginning of each document. This allows you to easily concatenate multiple documents into a single stream without doing any additional processing of the documents.

The other half of that line, !ruby/object:MyClass, is a tag. A tag is used to give a type to the following node. In YAML, every node has a type, even if it is implicit. You can also write the tag explicitly, for example text nodes have the type (tag) !!str. This can be useful in certain circumstances, for example here:

!!str 2018-10-31

This tells YAML that 2018-10-31 is text, not a date.

!ruby/object:MyClass is a tag used by Psych to indicate that the node is a serialized Ruby Object which is an instance of class MyClass. This way, when deserializing the document, Psych knows what class to instantiate and how to treat the node.

How to emit YAML in Ruby expanding aliases

The only way I've found to do this is to perform a deep clone of the object being dumped to YAML. This is because YAML will identify the anchors and aliases based on their identity, and if you clone or dup them, the new object will be equal, but have a different identity.

There are many ways to perform a deep clone, including library support, or writing your own helper function -- I'll leave that as an exercise for the reader.

Is it possible to emit valid YAML with anchors / references disabled using Ruby or Python?

I found this related ticket on the PyYAML website (http://pyyaml.org/ticket/91), it looks like anchors can be disabled by using a custom dumper along the lines of:

import yaml

class ExplicitDumper(yaml.SafeDumper):
"""
A dumper that will never emit aliases.
"""

def ignore_aliases(self, data):
return True

So, for example, the following outputs can be achieved using the standard dumper and the new explicit dumper:

>>> yaml.dump([1L, 1L])
"[&id001 !!python/long '1', *id001]\n"

>>> yaml.dump([1L, 1L], Dumper=ExplicitDumper)
'[1, 1]\n'

You can customise further properties to ensure pretty-printing etc. in the yaml.dump(...) call.

YAML data exchange issues between Perl and Ruby

According to the Yaml 1.1 spec, 1:16 is an integer in sexagesimal (base 60) format.

See also http://yaml.org/type/int.html, which says:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

The Yaml parser included in Ruby, Psych, recognises this format and converts the value into an integer (wrongly, 1:16 shoud be 71 – the Psych code seems to asume that all such values will be in the form a:b:c but the regex doesn’t enforce that). The Perl emitter (at least YAML::XS which I tested) doesn’t recognise this format, so doesn’t quote the string when writing the file. YAML::XS does recognise and quote some integers, but not all. YAML::XS also doesn’t recognise many other formats (e.g. dates) that Psych does.

(It appears that the sexagesimal format has been removed from the Yaml 1.2 spec.)

Psych allows quite a deal of flexibility in its parsing – YAML.load_file is just a simple interface for the common use cases.

You could use the parse methods of Psych to create a tree representation of the yaml, then convert this into a Ruby data structure using a custom ScalarScanner (which is the object that converts strings of certain formats to the appropriate Ruby type):

require('yaml')

class MyScalarScanner < Psych::ScalarScanner
def tokenize string
#this is the same regexp as Psych uses to detect base 60 ints:
return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
super
end
end

tree = YAML::parse_file 'test.yaml'
foo = Psych::Visitors::ToRuby.new(MyScalarScanner.new).accept tree

This is basically the same process that occurs when you use YAML.load_file, except that it uses the customised scanner class.

A similar alternative would be to open up ScalarScanner and replace the tokenize method with the customised one. This would allow you to use the simpler load_file interface, but with the usual caveats about monkey patching classes:

class Psych::ScalarScanner
alias :orig_tokenize :tokenize
def tokenize string
return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
orig_tokenize string
end
end

foo = YAML.load_file 'test.yaml'

Note that these examples only take into consideration values with a format like 1:16. Depending on what your Perl program is emitting you may need to override other patterns too. One in particular that you might want to look at is sexagesimal floats (e.g. 1:16.44).



Related Topics



Leave a reply



Submit