In Python, how can you load YAML mappings as OrderedDicts?
Note: there is a library, based on the following answer, which implements also the CLoader and CDumpers: Phynix/yamlloader
I doubt very much that this is the best way to do it, but this is the way I came up with, and it does work. Also available as a gist.
import yaml
import yaml.constructor
try:
# included in standard lib from Python 2.7
from collections import OrderedDict
except ImportError:
# try importing the backported drop-in replacement
# it's available on PyPI
from ordereddict import OrderedDict
class OrderedDictYAMLLoader(yaml.Loader):
"""
A YAML loader that loads mappings into ordered dictionaries.
"""
def __init__(self, *args, **kwargs):
yaml.Loader.__init__(self, *args, **kwargs)
self.add_constructor(u'tag:yaml.org,2002:map', type(self).construct_yaml_map)
self.add_constructor(u'tag:yaml.org,2002:omap', type(self).construct_yaml_map)
def construct_yaml_map(self, node):
data = OrderedDict()
yield data
value = self.construct_mapping(node)
data.update(value)
def construct_mapping(self, node, deep=False):
if isinstance(node, yaml.MappingNode):
self.flatten_mapping(node)
else:
raise yaml.constructor.ConstructorError(None, None,
'expected a mapping node, but found %s' % node.id, node.start_mark)
mapping = OrderedDict()
for key_node, value_node in node.value:
key = self.construct_object(key_node, deep=deep)
try:
hash(key)
except TypeError, exc:
raise yaml.constructor.ConstructorError('while constructing a mapping',
node.start_mark, 'found unacceptable key (%s)' % exc, key_node.start_mark)
value = self.construct_object(value_node, deep=deep)
mapping[key] = value
return mapping
Can PyYAML dump dict items in non-alphabetical order?
There's probably a better workaround, but I couldn't find anything in the documentation or the source.
Python 2 (see comments)
I subclassed OrderedDict
and made it return a list of unsortable items:
from collections import OrderedDict
class UnsortableList(list):
def sort(self, *args, **kwargs):
pass
class UnsortableOrderedDict(OrderedDict):
def items(self, *args, **kwargs):
return UnsortableList(OrderedDict.items(self, *args, **kwargs))
yaml.add_representer(UnsortableOrderedDict, yaml.representer.SafeRepresenter.represent_dict)
And it seems to work:
>>> d = UnsortableOrderedDict([
... ('z', 0),
... ('y', 0),
... ('x', 0)
... ])
>>> yaml.dump(d, default_flow_style=False)
'z: 0\ny: 0\nx: 0\n'
Python 3 or 2 (see comments)
You can also write a custom representer, but I don't know if you'll run into problems later on, as I stripped out some style checking code from it:
import yaml
from collections import OrderedDict
def represent_ordereddict(dumper, data):
value = []
for item_key, item_value in data.items():
node_key = dumper.represent_data(item_key)
node_value = dumper.represent_data(item_value)
value.append((node_key, node_value))
return yaml.nodes.MappingNode(u'tag:yaml.org,2002:map', value)
yaml.add_representer(OrderedDict, represent_ordereddict)
But with that, you can use the native OrderedDict
class.
How to maintain order of insertion of keys when loading file into yaml?
In the YAML specification it is explicitly stated that mapping keys have no order. In a file however they have. If you want a simple way to solve this replace PyYAML with ruamel.yaml (disclaimer: I am the author of that package, which is a superset of PyYAML) and use round_trip_load()
, it will give you ordered dictionaries without the hassle of using single mapping item sequence elements that you need for specifying ordered dicts the "official" way.
import ruamel.yaml
yaml_str = """\
1:
name: apple
price: 5
3:
name: orange
price: 6
2:
name: pear
price: 2
"""
data = ruamel.yaml.round_trip_load(yaml_str)
for key in data:
print(key)
gives
1
3
2
BTW PyYAML doesn't sort by the keys, that ordering is just a side-effect of calculating hashes and inserting integer keys 1
, 2
, 3
in python dict
s.
PyYAML : Control ordering of items called by yaml.load()
The YAML spec clearly says that the key order within a mapping is a "representation detail" that cannot be relied on. So your settings file is already invalid if it's relying on the mapping, and you'd be much better off using valid YAML, if at all possible.
Of course YAML is extensible, and there's nothing stopping you from adding an "ordered mapping" type to your settings files. For example:
!omap setting1:
name: [item,item]
name1: text
!omap anothersetting2:
name: [item,item]
!omap sub_setting:
name :[item,item]
You didn't mention which yaml
module you're using. There is no such module in the standard library, and there are at least two packages just on PyPI that provide modules with that name. However, I'm going to guess it's PyYAML, because as far as I know that's the most popular.
The extension described above is easy to parse with PyYAML. See http://pyyaml.org/ticket/29:
def omap_constructor(loader, node):
return loader.construct_pairs(node)
yaml.add_constructor(u'!omap', omap_constructor)
Now, instead of:
{'anothersetting2': {'name': ['item', 'item'],
'sub_setting': 'name :[item,item]'},
'setting1': {'name': ['item', 'item'], 'name1': 'text'}}
You'll get this:
(('anothersetting2', (('name', ['item', 'item']),
('sub_setting', ('name, [item,item]'),))),
('setting1', (('name', ['item', 'item']), ('name1', 'text'))))
Of course this gives you a tuple
of key-value tuple
s, but you can easily write a construct_ordereddict and get an OrderedDict
instead. You can also write a representer that stores OrdereredDict
objects as !omap
s, if you need to output as well as input.
If you really want to hook PyYAML to make it use an OrderedDict
instead of a dict
for default mappings, it's pretty easy to do if you're already working directly on parser objects, but more difficult if you want to stick with the high-level convenience methods. Fortunately, the above-linked ticket has an implementation you can use. Just remember that you're not using real YAML anymore, but a variant, so any other software that deals with your files can, and likely will, break.
Dicts become OrderedDicts with yaml files
PyYAML by default, does write out composite leaf nodes in flow-style and the rest in block-style.
If you don't want that, i.e. want everything to be block-style, use safe_dump(data, default_flow_style=False)
:
import sys
import yaml
strikes = {'User1': {'name': 'name1', 'id': '001', 'strikes': 1}, 'User2': {'name': 'name2', 'id': '002', 'strikes': 3}}
yaml.safe_dump(strikes, sys.stdout, default_flow_style=False)
gives:
User1:
id: '001'
name: name1
strikes: 1
User2:
id: '002'
name: name2
strikes: 3
There is no reason to use yaml.dump()
instead of yaml.safe_dump()
(and I definately hope you are not using yaml.load()
instead of yaml.safe_load()
)
Preserving order of dictionary while using ruamel.yaml
You should really not be using the old PyYAML API that sorts keys when dumping.
Instantiate a YAML instance and use its dump method:
yaml = ruamel.yaml.YAML()
yaml.dump(data, stream)
Related Topics
Difference Between Numpy Dot() and Python 3.5+ Matrix Multiplication @
Databaseerror: Current Transaction Is Aborted, Commands Ignored Until End of Transaction Block
Share Large, Read-Only Numpy Array Between Multiprocessing Processes
Add X and Y Labels to a Pandas Plot
Python Assigning Multiple Variables to Same Value? List Behavior
Split Text Lines in Scanned Document
When Would the -E, --Editable Option Be Useful with Pip Install
In Python, How to Load Yaml Mappings as Ordereddicts
How to Set Timeout on Python's Socket Recv Method
How to Retrieve Inserted Id After Inserting Row in SQLite Using Python
Capture Keyboardinterrupt in Python Without Try-Except
How to Properly Assert That an Exception Gets Raised in Pytest