How to Force To_Yaml to Output Long Strings in Literal Block Style

Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

import yaml

class folded_unicode(unicode): pass
class literal_unicode(unicode): pass

def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')

yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)

data = {
'literal':literal_unicode(
u'by hjw ___\n'
' __ /.-.\\\n'
' / )_____________\\\\ Y\n'
' /_ /=== == === === =\\ _\\_\n'
'( /)=== == === === == Y \\\n'
' `-------------------( o )\n'
' \\___/\n'),
'folded': folded_unicode(
u'It removes all ordinary curses from all equipped items. '
'Heavy or permanent curses are unaffected.\n')}

print yaml.dump(data)

The result:

folded: >
It removes all ordinary curses from all equipped items. Heavy or permanent curses
are unaffected.
literal: |
by hjw ___
__ /.-.\
/ )_____________\\ Y
/_ /=== == === === =\ _\_
( /)=== == === === == Y \
`-------------------( o )
\___/

For completeness, one should also have str implementations, but I'm going to be lazy :-)

How do I break a string in YAML over multiple lines?

Using yaml folded style. The indention in each line will be ignored. A line break will be inserted at the end.

Key: >
This is a very long sentence
that spans several lines in the YAML
but which will be rendered as a string
with only a single carriage return appended to the end.

http://symfony.com/doc/current/components/yaml/yaml_format.html

You can use the "block chomping indicator" to eliminate the trailing line break, as follows:

Key: >-
This is a very long sentence
that spans several lines in the YAML
but which will be rendered as a string
with NO carriage returns.

In either case, each line break is replaced by a space.

There are other control tools available as well (for controlling indentation for example).

See https://yaml-multiline.info/

How to dump strings in YAML using literal scalar style?

require 'psych'

# Construct an AST
visitor = Psych::Visitors::YAMLTree.new({})
visitor << DATA.read
ast = visitor.tree

# Find all scalars and modify their formatting
ast.grep(Psych::Nodes::Scalar).each do |node|
node.plain = false
node.quoted = true
node.style = Psych::Nodes::Scalar::LITERAL
end

begin
# Call the `yaml` method on the ast to convert to yaml
puts ast.yaml
rescue
# The `yaml` method was introduced in later versions, so fall back to
# constructing a visitor
Psych::Visitors::Emitter.new($stdout).accept ast
end

__END__
{
"page": 1,
"results": [
"item", "another"
],
"total_pages": 0
}

How can I control what scalar form PyYAML uses for my data?

Based on Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?

import yaml
from collections import OrderedDict

class quoted(str):
pass

def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)

class literal(str):
pass

def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)

def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)

d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))

print(yaml.dump(d))

Output

short: "Hello"
long: |
Line1
Line2
Line3

yaml.dump adding unwanted newlines in multiline strings

If that is the only thing going into your YAML file then you can dump with the option default_style='|' which gives you block style literal for all of your scalars (probably not what you want).

Your string, contains no special characters (that need \ escaping and double quotes), because of the newlines PyYAML decides to represented single quoted. In single quoted style a double newline is the way to represent a single newline that occurred in string that is represented. This gets "undone" on loading, but is indeed not very readable.

If you want to get the block style literals on an individual basis, you can do multiple things:

  • adapt the Representer to output all strings with embedded newlines using the literal scalar block style (assuming they don't need \ escaping of special characters, which will force double quotes)

    import sys
    import yaml

    x = u"""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """

    yaml.SafeDumper.org_represent_str = yaml.SafeDumper.represent_str

    def repr_str(dumper, data):
    if '\n' in data:
    return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
    return dumper.org_represent_str(data)

    yaml.add_representer(str, repr_str, Dumper=yaml.SafeDumper)

    yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)
  • make a subclass of string, that has its special representer. You should be able to take the code for that from here, here and here:

    import sys
    import yaml

    class PSS(str):
    pass

    x = PSS("""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """)

    def pss_representer(dumper, data):
    style = '|'
    # if sys.versioninfo < (3,) and not isinstance(data, unicode):
    # data = unicode(data, 'ascii')
    tag = u'tag:yaml.org,2002:str'
    return dumper.represent_scalar(tag, data, style=style)

    yaml.add_representer(PSS, pss_representer, Dumper=yaml.SafeDumper)

    yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)
  • use ruamel.yaml:

    import sys
    from ruamel.yaml import YAML
    from ruamel.yaml.scalarstring import PreservedScalarString as pss

    x = pss("""\
    -----BEGIN RSA PRIVATE KEY-----
    MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
    xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
    ...
    """)

    yaml = YAML()

    yaml.dump(dict(a=1, b='hello world', c=x), sys.stdout)

All of these give:

a: 1
b: hello world
c: |
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
...

Please note that it is not necessary to specify default_flow_style=False as the literal scalars can only appear in block style.

Change the scalar style used for all multi-line strings when serialising a dynamic model using YamlDotNet

To answer my own question, I've now worked out how to do this by deriving from the ChainedEventEmitter class and overriding void Emit(ScalarEventInfo eventInfo, IEmitter emitter). See code sample below.

public class MultilineScalarFlowStyleEmitter : ChainedEventEmitter
{
public MultilineScalarFlowStyleEmitter(IEventEmitter nextEmitter)
: base(nextEmitter) { }

public override void Emit(ScalarEventInfo eventInfo, IEmitter emitter)
{

if (typeof(string).IsAssignableFrom(eventInfo.Source.Type))
{
string value = eventInfo.Source.Value as string;
if (!string.IsNullOrEmpty(value))
{
bool isMultiLine = value.IndexOfAny(new char[] { '\r', '\n', '\x85', '\x2028', '\x2029' }) >= 0;
if (isMultiLine)
eventInfo = new ScalarEventInfo(eventInfo.Source)
{
Style = ScalarStyle.Literal
};
}
}

nextEmitter.Emit(eventInfo, emitter);
}
}


Related Topics



Leave a reply



Submit