In Python, How to Load Yaml Mappings as Ordereddicts

In Python, how can you load YAML mappings as OrderedDicts?

Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader

I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.

import yaml
import yaml.constructor

try:
# included in standard lib from Python 2.7
from collections import OrderedDict
except ImportError:
# try importing the backported drop-in replacement
# it's available on PyPI
from ordereddict import OrderedDict

class OrderedDictYAMLLoader(yaml.Loader):
"""
A YAML loader that loads mappings into ordered dictionaries.
"""

def __init__(self, *args, **kwargs):
yaml.Loader.__init__(self, *args, **kwargs)

self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)

def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)

def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(None, None,
'expected a mapping node, but found %s' % node.id, node.start_mark)

mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError, exc:
raise yaml.constructor.ConstructorError('while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping

Can PyYAML dump dict items in non-alphabetical order?

There's probably a better workaround, but I couldn't find anything in the documentation or the source.


Python 2 (see comments)

I subclassed OrderedDict and made it return a list of unsortable items:

from collections import OrderedDict

class UnsortableList(list):
def sort(self, *args, **kwargs):
pass

class UnsortableOrderedDict(OrderedDict):
def items(self, *args, **kwargs):
return UnsortableList(OrderedDict.items(self, *args, **kwargs))

yaml.add_representer(UnsortableOrderedDict, yaml.representer.SafeRepresenter.represent_dict)

And it seems to work:

>>> d = UnsortableOrderedDict([
... ('z', 0),
... ('y', 0),
... ('x', 0)
... ])
>>> yaml.dump(d, default_flow_style=False)
'z: 0\ny: 0\nx: 0\n'

Python 3 or 2 (see comments)

You can also write a custom representer, but I don't know if you'll run into problems later on, as I stripped out some style checking code from it:

import yaml

from collections import OrderedDict

def represent_ordereddict(dumper, data):
value = []

for item_key, item_value in data.items():
node_key = dumper.represent_data(item_key)
node_value = dumper.represent_data(item_value)

value.append((node_key, node_value))

return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)

yaml.add_representer(OrderedDict, represent_ordereddict)

But with that, you can use the native OrderedDict class.

How to maintain order of insertion of keys when loading file into yaml?

In the YAML specification it is explicitly stated that mapping keys have no order. In a file however they have. If you want a simple way to solve this replace PyYAML with ruamel.yaml (disclaimer: I am the author of that package, which is a superset of PyYAML) and use round_trip_load(), it will give you ordered dictionaries without the hassle of using single mapping item sequence elements that you need for specifying ordered dicts the "official" way.

import ruamel.yaml

yaml_str = """\
1:
name: apple
price: 5
3:
name: orange
price: 6
2:
name: pear
price: 2
"""

data = ruamel.yaml.round_trip_load(yaml_str)
for key in data:
print(key)

gives

1
3
2

BTW PyYAML doesn't sort by the keys, that ordering is just a side-effect of calculating hashes and inserting integer keys 1, 2 , 3 in python dicts.

PyYAML : Control ordering of items called by yaml.load()

The YAML spec clearly says that the key order within a mapping is a "representation detail" that cannot be relied on. So your settings file is already invalid if it's relying on the mapping, and you'd be much better off using valid YAML, if at all possible.

Of course YAML is extensible, and there's nothing stopping you from adding an "ordered mapping" type to your settings files. For example:

!omap setting1:
name: [item,item]
name1: text
!omap anothersetting2:
name: [item,item]
!omap sub_setting:
name :[item,item]

You didn't mention which yaml module you're using. There is no such module in the standard library, and there are at least two packages just on PyPI that provide modules with that name. However, I'm going to guess it's PyYAML, because as far as I know that's the most popular.

The extension described above is easy to parse with PyYAML. See http://pyyaml.org/ticket/29:

def omap_constructor(loader, node):
return loader.construct_pairs(node)
yaml.add_constructor(u'!omap', omap_constructor)

Now, instead of:

{'anothersetting2': {'name': ['item', 'item'],
'sub_setting': 'name :[item,item]'},
'setting1': {'name': ['item', 'item'], 'name1': 'text'}}

You'll get this:

(('anothersetting2', (('name', ['item', 'item']),
('sub_setting', ('name, [item,item]'),))),
('setting1', (('name', ['item', 'item']), ('name1', 'text'))))

Of course this gives you a tuple of key-value tuples, but you can easily write a construct_ordereddict and get an OrderedDict instead. You can also write a representer that stores OrdereredDict objects as !omaps, if you need to output as well as input.

If you really want to hook PyYAML to make it use an OrderedDict instead of a dict for default mappings, it's pretty easy to do if you're already working directly on parser objects, but more difficult if you want to stick with the high-level convenience methods. Fortunately, the above-linked ticket has an implementation you can use. Just remember that you're not using real YAML anymore, but a variant, so any other software that deals with your files can, and likely will, break.

Dicts become OrderedDicts with yaml files

PyYAML by default, does write out composite leaf nodes in flow-style and the rest in block-style.

If you don't want that, i.e. want everything to be block-style, use safe_dump(data, default_flow_style=False):

import sys
import yaml

strikes = {'User1': {'name': 'name1', 'id': '001', 'strikes': 1}, 'User2': {'name': 'name2', 'id': '002', 'strikes': 3}}

yaml.safe_dump(strikes, sys.stdout, default_flow_style=False)

gives:

User1:
id: '001'
name: name1
strikes: 1
User2:
id: '002'
name: name2
strikes: 3

There is no reason to use yaml.dump() instead of yaml.safe_dump() (and I definately hope you are not using yaml.load() instead of yaml.safe_load())

Preserving order of dictionary while using ruamel.yaml

You should really not be using the old PyYAML API that sorts keys when dumping.

Instantiate a YAML instance and use its dump method:

 yaml = ruamel.yaml.YAML()
yaml.dump(data, stream)


Related Topics



Leave a reply



Submit