How to Emit Yaml in Ruby Expanding Aliases

How to emit YAML in Ruby expanding aliases

The only way I've found to do this is to perform a deep clone of the object being dumped to YAML. This is because YAML will identify the anchors and aliases based on their identity, and if you clone or dup them, the new object will be equal, but have a different identity.

There are many ways to perform a deep clone, including library support, or writing your own helper function -- I'll leave that as an exercise for the reader.

Is it possible to emit valid YAML with anchors / references disabled using Ruby or Python?

I found this related ticket on the PyYAML website (http://pyyaml.org/ticket/91), it looks like anchors can be disabled by using a custom dumper along the lines of:

import yaml

class ExplicitDumper(yaml.SafeDumper):
"""
A dumper that will never emit aliases.
"""

def ignore_aliases(self, data):
return True

So, for example, the following outputs can be achieved using the standard dumper and the new explicit dumper:

>>> yaml.dump([1L, 1L])
"[&id001 !!python/long '1', *id001]\n"

>>> yaml.dump([1L, 1L], Dumper=ExplicitDumper)
'[1, 1]\n'

You can customise further properties to ensure pretty-printing etc. in the yaml.dump(...) call.

Read and write YAML files without destroying anchors and aliases

The problem here is that anchors and aliases in Yaml are a serialization detail, and so aren’t part of the data after it’s been parsed, so the original anchor name isn’t known when writing the data back out to Yaml. In order to keep the anchor names when round tripping you need to store them somewhere when parsing so that they are available later when serializing. In Ruby any object can have instance variables associated with it, so an easy way to achieve this would be to store the anchor name in an instance variable of the objet in question.

Continuing from the example in the earlier question, for hashes we can change our redifined revive_hash method so that if the hash is an anchor then as well as recording the anchor name in the @st variable so later alises can be recognised, we add the it as an instance variable on the hash.

class ToRubyNoMerge < Psych::Visitors::ToRuby
def revive_hash hash, o
if o.anchor
@st[o.anchor] = hash
hash.instance_variable_set "@_yaml_anchor_name", o.anchor
end

o.children.each_slice(2) { |k,v|
key = accept(k)
hash[key] = accept(v)
}
hash
end
end

Note that this only affects yaml mappings that are anchors. If you want to have other types to keep their anchor name you’ll need to look at psych/visitors/to_ruby.rb and make sure the name is added in all cases. Most types can be included by overriding register but there are a couple of others; search for @st.

Now that the hash has the desired anchor name associated with it, you need to make Psych use it instead of the object id when serializing it. This can be done by subclassing YAMLTree. When YAMLTree processes an object, it first checks to see if that object has been seen already, and emits an alias for it if it has. For any new objects, it records that it has seen the object in case it needs to create an alias later. The object_id is used as the key in this, so you need to override those two methods to check for the instance variable, and use that instead if it exists:

class MyYAMLTree < Psych::Visitors::YAMLTree

# check to see if this object has been seen before
def accept target
if anchor_name = target.instance_variable_get('@_yaml_anchor_name')
if @st.key? anchor_name
oid = anchor_name
node = @st[oid]
anchor = oid.to_s
node.anchor = anchor
return @emitter.alias anchor
end
end

# accept is a pretty big method, call super to avoid copying
# it all here. super will handle the cases when it's an object
# that's been seen but doesn't have '@_yaml_anchor_name' set
super
end

# record object for future, using '@_yaml_anchor_name' rather
# than object_id if it exists
def register target, yaml_obj
anchor_name = target.instance_variable_get('@_yaml_anchor_name') || target.object_id
@st[anchor_name] = yaml_obj
yaml_obj
end
end

Now you can use it like this (unlike the previous question, you don’t need to create a custom emitter in this case):

builder = MyYAMLTree.new
builder << data

tree = builder.tree

puts tree.yaml # returns a string

# alternativelty write direct to file:
File.open('a_file.yml', 'r+') do |f|
tree.yaml f
end

YAML data exchange issues between Perl and Ruby

According to the Yaml 1.1 spec, 1:16 is an integer in sexagesimal (base 60) format.

See also http://yaml.org/type/int.html, which says:

Using “:” allows expressing integers in base 60, which is convenient for time and angle values.

The Yaml parser included in Ruby, Psych, recognises this format and converts the value into an integer (wrongly, 1:16 shoud be 71 – the Psych code seems to asume that all such values will be in the form a:b:c but the regex doesn’t enforce that). The Perl emitter (at least YAML::XS which I tested) doesn’t recognise this format, so doesn’t quote the string when writing the file. YAML::XS does recognise and quote some integers, but not all. YAML::XS also doesn’t recognise many other formats (e.g. dates) that Psych does.

(It appears that the sexagesimal format has been removed from the Yaml 1.2 spec.)

Psych allows quite a deal of flexibility in its parsing – YAML.load_file is just a simple interface for the common use cases.

You could use the parse methods of Psych to create a tree representation of the yaml, then convert this into a Ruby data structure using a custom ScalarScanner (which is the object that converts strings of certain formats to the appropriate Ruby type):

require('yaml')

class MyScalarScanner < Psych::ScalarScanner
def tokenize string
#this is the same regexp as Psych uses to detect base 60 ints:
return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
super
end
end

tree = YAML::parse_file 'test.yaml'
foo = Psych::Visitors::ToRuby.new(MyScalarScanner.new).accept tree

This is basically the same process that occurs when you use YAML.load_file, except that it uses the customised scanner class.

A similar alternative would be to open up ScalarScanner and replace the tokenize method with the customised one. This would allow you to use the simpler load_file interface, but with the usual caveats about monkey patching classes:

class Psych::ScalarScanner
alias :orig_tokenize :tokenize
def tokenize string
return string if string =~ /^[-+]?[0-9][0-9_]*(:[0-5]?[0-9])+$/
orig_tokenize string
end
end

foo = YAML.load_file 'test.yaml'

Note that these examples only take into consideration values with a format like 1:16. Depending on what your Perl program is emitting you may need to override other patterns too. One in particular that you might want to look at is sexagesimal floats (e.g. 1:16.44).

How can I include a YAML file inside another?

No, standard YAML does not include any kind of "import" or "include" statement.

How do I deserialize classes in Psych?

The Psych maintainer has implemented the serialization and deserialization of classes and modules. It's now in Ruby!



Related Topics



Leave a reply



Submit