How to dump strings in YAML using literal scalar style?
require 'psych'
# Construct an AST
visitor = Psych::Visitors::YAMLTree.new({})
visitor << DATA.read
ast = visitor.tree
# Find all scalars and modify their formatting
ast.grep(Psych::Nodes::Scalar).each do |node|
node.plain = false
node.quoted = true
node.style = Psych::Nodes::Scalar::LITERAL
end
begin
# Call the `yaml` method on the ast to convert to yaml
puts ast.yaml
rescue
# The `yaml` method was introduced in later versions, so fall back to
# constructing a visitor
Psych::Visitors::Emitter.new($stdout).accept ast
end
__END__
{
"page": 1,
"results": [
"item", "another"
],
"total_pages": 0
}
Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?
import yaml
class folded_unicode(unicode): pass
class literal_unicode(unicode): pass
def folded_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='>')
def literal_unicode_representer(dumper, data):
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(folded_unicode, folded_unicode_representer)
yaml.add_representer(literal_unicode, literal_unicode_representer)
data = {
'literal':literal_unicode(
u'by hjw ___\n'
' __ /.-.\\\n'
' / )_____________\\\\ Y\n'
' /_ /=== == === === =\\ _\\_\n'
'( /)=== == === === == Y \\\n'
' `-------------------( o )\n'
' \\___/\n'),
'folded': folded_unicode(
u'It removes all ordinary curses from all equipped items. '
'Heavy or permanent curses are unaffected.\n')}
print yaml.dump(data)
The result:
folded: >
It removes all ordinary curses from all equipped items. Heavy or permanent curses
are unaffected.
literal: |
by hjw ___
__ /.-.\
/ )_____________\\ Y
/_ /=== == === === =\ _\_
( /)=== == === === == Y \
`-------------------( o )
\___/
For completeness, one should also have str implementations, but I'm going to be lazy :-)
How can I control what scalar form PyYAML uses for my data?
Based on Any yaml libraries in Python that support dumping of long strings as block literals or folded blocks?
import yaml
from collections import OrderedDict
class quoted(str):
pass
def quoted_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='"')
yaml.add_representer(quoted, quoted_presenter)
class literal(str):
pass
def literal_presenter(dumper, data):
return dumper.represent_scalar('tag:yaml.org,2002:str', data, style='|')
yaml.add_representer(literal, literal_presenter)
def ordered_dict_presenter(dumper, data):
return dumper.represent_dict(data.items())
yaml.add_representer(OrderedDict, ordered_dict_presenter)
d = OrderedDict(short=quoted("Hello"), long=literal("Line1\nLine2\nLine3\n"))
print(yaml.dump(d))
Output
short: "Hello"
long: |
Line1
Line2
Line3
How to format a string in YAML dump?
First of all, what you present as that what you would like to get as output,
is not a representation of the data that you provide. Since
the multi-line string in that data starts with a newline, the block
style literal scalar for that requires a block indentation indicator and a newline at the start:
address_pattern_template: |2
^ #the beginning of the address string (e.g. interface number)
.
.
.
But it doesn't make sense (to me at least) to have these patterns
start with a newline, so I'll leave that out in the following.
If you don't know where the multi-line strings are in your data structure, but if you can
convert it in-place before dumping, than you can use ruamel.yaml.scalarstring:walk_tree
import sys
import ruamel.yaml
data = dict(a=[1, 2, 3, dict(
address_pattern_template="""\
^ #the beginning of the address string (e.g. interface number)
(?P<junkbefore> #capturing the junk before the address
\D? #an optional non-digit character
.*? #any characters (non-greedy) up to the address
)
(?P<address> #capturing the pure address
{pure_address_pattern}
)
(?P<junkafter> #capturing the junk after the address
\D? #an optional non-digit character
.* #any characters (greedy) up to the end of the string
)
$ #the end of the input address string
"""
)])
yaml = ruamel.yaml.YAML()
ruamel.yaml.scalarstring.walk_tree(data)
yaml.dump(data, sys.stdout)
which gives:
a:
- 1
- 2
- 3
- address_pattern_template: |
^ #the beginning of the address string (e.g. interface number)
(?P<junkbefore> #capturing the junk before the address
\D? #an optional non-digit character
.*? #any characters (non-greedy) up to the address
)
(?P<address> #capturing the pure address
{pure_address_pattern}
)
(?P<junkafter> #capturing the junk after the address
\D? #an optional non-digit character
.* #any characters (greedy) up to the end of the string
)
$ #the end of the input address string
walk_tree
will replace the the multiline string withLiteralScalarString
, which behave for most purposes like a normal
string.
If that in-place transform is not acceptable, you can do a deepcopy of
data first and then apply walk_tree
on the copy. If that is not is acceptable
because of memory constraints, then you have to provide an alternative representer for strings
that checks during representation if you have multi-line string. Preferably you do that
in a subclass the Representer:
import sys
import ruamel.yaml
# data defined as before
class MyRepresenter(ruamel.yaml.representer.RoundTripRepresenter):
def represent_str(self, data):
style = '|' if '\n' in data else None
return self.represent_scalar(u'tag:yaml.org,2002:str', data, style=style)
MyRepresenter.add_representer(str, MyRepresenter.represent_str)
yaml = ruamel.yaml.YAML()
yaml.Representer = MyRepresenter
yaml.dump(data, sys.stdout)
which gives the same output as the previous example.
yaml.dump adding unwanted newlines in multiline strings
If that is the only thing going into your YAML file then you can dump with the option default_style='|'
which gives you block style literal for all of your scalars (probably not what you want).
Your string, contains no special characters (that need \
escaping and double quotes), because of the newlines PyYAML decides to represented single quoted. In single quoted style a double newline is the way to represent a single newline that occurred in string that is represented. This gets "undone" on loading, but is indeed not very readable.
If you want to get the block style literals on an individual basis, you can do multiple things:
adapt the Representer to output all strings with embedded newlines using the literal scalar block style (assuming they don't need
\
escaping of special characters, which will force double quotes)import sys
import yaml
x = u"""\
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
...
"""
yaml.SafeDumper.org_represent_str = yaml.SafeDumper.represent_str
def repr_str(dumper, data):
if '\n' in data:
return dumper.represent_scalar(u'tag:yaml.org,2002:str', data, style='|')
return dumper.org_represent_str(data)
yaml.add_representer(str, repr_str, Dumper=yaml.SafeDumper)
yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)make a subclass of string, that has its special representer. You should be able to take the code for that from here, here and here:
import sys
import yaml
class PSS(str):
pass
x = PSS("""\
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
...
""")
def pss_representer(dumper, data):
style = '|'
# if sys.versioninfo < (3,) and not isinstance(data, unicode):
# data = unicode(data, 'ascii')
tag = u'tag:yaml.org,2002:str'
return dumper.represent_scalar(tag, data, style=style)
yaml.add_representer(PSS, pss_representer, Dumper=yaml.SafeDumper)
yaml.safe_dump(dict(a=1, b='hello world', c=x), sys.stdout)use
ruamel.yaml
:import sys
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import PreservedScalarString as pss
x = pss("""\
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
...
""")
yaml = YAML()
yaml.dump(dict(a=1, b='hello world', c=x), sys.stdout)
All of these give:
a: 1
b: hello world
c: |
-----BEGIN RSA PRIVATE KEY-----
MIIEogIBAAKCAQEA6oySC+8/N9VNpk0gJS7Gk8vn9sYN7FhjpAQnoHRqTN/Oaiyx
xk2AleP2vXpojA/DHldT1JO+o3j56AHD+yfNFFeYvgWKDY35g49HsZZhbyCEAB45
...
Please note that it is not necessary to specify default_flow_style=False
as the literal scalars can only appear in block style.
Change the scalar style used for all multi-line strings when serialising a dynamic model using YamlDotNet
To answer my own question, I've now worked out how to do this by deriving from the ChainedEventEmitter
class and overriding void Emit(ScalarEventInfo eventInfo, IEmitter emitter)
. See code sample below.
public class MultilineScalarFlowStyleEmitter : ChainedEventEmitter
{
public MultilineScalarFlowStyleEmitter(IEventEmitter nextEmitter)
: base(nextEmitter) { }
public override void Emit(ScalarEventInfo eventInfo, IEmitter emitter)
{
if (typeof(string).IsAssignableFrom(eventInfo.Source.Type))
{
string value = eventInfo.Source.Value as string;
if (!string.IsNullOrEmpty(value))
{
bool isMultiLine = value.IndexOfAny(new char[] { '\r', '\n', '\x85', '\x2028', '\x2029' }) >= 0;
if (isMultiLine)
eventInfo = new ScalarEventInfo(eventInfo.Source)
{
Style = ScalarStyle.Literal
};
}
}
nextEmitter.Emit(eventInfo, emitter);
}
}
Can I control the formatting of multiline strings?
If you load, then dump, your expected output, you'll see that ruamel.yaml
can actually
preserve the block style literal scalar.
import sys
import ruamel.yaml
yaml_str = """\
hello.py: |
import sys
sys.stdout.write("hello world")
"""
yaml = ruamel.yaml.YAML()
data = yaml.load(yaml_str)
yaml.dump(data, sys.stdout)
as this gives again the loaded input:
hello.py: |
import sys
sys.stdout.write("hello world")
To find out how it does that you should inspect the type of your multi-line string:
print(type(data['hello.py']))
which prints:
<class 'ruamel.yaml.scalarstring.LiteralScalarString'>
and that should point you in the right direction:
from ruamel.yaml import YAML
from ruamel.yaml.scalarstring import LiteralScalarString
import sys, textwrap
def LS(s):
return LiteralScalarString(textwrap.dedent(s))
yaml = ruamel.yaml.YAML()
yaml.dump({
'hello.py': LS("""\
import sys
sys.stdout.write("hello world")
""")
}, sys.stdout)
which also outputs what you want:
hello.py: |
import sys
sys.stdout.write("hello world")
Convert YAML multi-line values to folded block scalar style?
The class ScalarString
is a base class for LiteralScalarString
, it has no representer as you found out. You should just make/keep this a Python string, as that deals with special characters appropriately (quoting strings that need to be quoted to conform to the YAML specification).
Assuming you have input like this:
- 1
- abc: |
this is a short string scalar with a newline
in it
- "there are also a multiline\nsequence element\nin this file\nand it is longer"
You probably want to do something like:
import ruamel.yaml
from ruamel.yaml.scalarstring import LiteralScalarString, preserve_literal
def walk_tree(base):
from ruamel.yaml.compat import string_types
def test_wrap(v):
v = v.replace('\r\n', '\n').replace('\r', '\n').strip()
return v if len(v) < 72 else preserve_literal(v)
if isinstance(base, dict):
for k in base:
v = base[k]
if isinstance(v, string_types) and '\n' in v:
base[k] = test_wrap(v)
else:
walk_tree(v)
elif isinstance(base, list):
for idx, elem in enumerate(base):
if isinstance(elem, string_types) and '\n' in elem:
base[idx] = test_wrap(elem)
else:
walk_tree(elem)
yaml = YAML()
with open("input.yaml", "r") as fi:
data = yaml.load(fi)
walk_tree(data)
with open("output.yaml", "w") as fo:
yaml.dump(data, fo)
to get output:
- 1
- abc: "this is a short string scalar with a newline\nin it"
- |-
there are also a multiline
sequence element
in this file
and it is longer
Some notes:
- Use of
LiteralScalarString
is preferred overPreservedScalarString
. The latter name a remnant from the time it was the only preserved string type. - you probably had no sequence elements that where strings, as you did not import
preserve_literal
, although it was still used in the copied code. - I factored out the "wrapping" code into test_wrap, used by both value and element wrapping, the max line length for that was set at 72 characters.
- the value
data[1]['abc']
loads asLiteralScalarString
. If you want to preserve existing literal style string scalars, you should test for those before testing on typestring_types
. - I used the new API with an instance of
YAML()
- You might have to set the
width
attribute to something like 1000, to prevent automatic line wrapping, if you increase 72 in the example to above the default of 80. (yaml.width = 1000
)
Related Topics
Ruby Gems Returns "Command Not Found"
What's the Best/Easiest Gui Library for Ruby
What Is Your Preferred Way to Produce Charts in a Ruby on Rails Web Application
How to Run Only Specific Tests in Rspec
How to Test If Parameters Exist in Rails
Rspec: "Array.Should == Another_Array" But Without Concern for Order
Trouble Yielding Inside a Block/Lambda
Editing Existing Rails Migrations Is a Good Idea
Error Installing Ruby with Rvm (Osx 10.8)
Combining Multiple Named Scopes with Or
Ruby Variable (Array) Assignment Misunderstanding (With Push Method)
Why Is Rails Outputting My Array
Rspec --Init Not Working/ 'Mkd Ir': Invalid Argument - ./C: (Errno::Einval)
How to Insert Video Youtube API V3 Through Service Account with Ruby
How to Declare a String with Both Single and Double Quotes in Yaml