Strange \N in Base64 Encoded String in Ruby

Strange \n in base64 encoded string in Ruby

Edit: Since I wrote this answer Base64.strict_encode64() was added, which does not add newlines.


The docs are somewhat confusing, the b64encode method is supposed to add a newline for every 60th character, and the example for the encode64 method is actually using the b64encode method.

It seems the pack("m") method for the Array class used by encode64 also adds the newlines. I would consider it a design bug that this is not optional.

You could either remove the newlines yourself, or if you're using rails, there's ActiveSupport::CoreExtensions::Base64::Encoding with the encode64s method.

How to avoid writing \n to file

The line feed is generated as part of the encoding process of Base64.encode64.

encode64(bin)

Returns the Base64-encoded version of bin. This method complies with
RFC 2045. Line feeds are added to every 60 encoded characters.

require 'base64'
Base64.encode64("Now is the time for all good coders\nto learn Ruby")

Generates:

Tm93IGlzIHRoZSB0aW1lIGZvciBhbGwgZ29vZCBjb2RlcnMKdG8gbGVhcm4g
UnVieQ==

There is also Base64.strict_encode64 which doesn't add any line feeds.

strict_encode64(bin)

Returns the Base64-encoded version of bin. This method complies with
RFC 4648. No line feeds are added.

data = "testkjadlskjflkasjfwe08u185u12w3oqew,dmf.smxa"
Base64.encode64 data
#=> "dGVzdGtqYWRsc2tqZmxrYXNqZndlMDh1MTg1dTEydzNvcWV3LGRtZi5zbXhh\n"
Base64.strict_encode64 data
#=> "dGVzdGtqYWRsc2tqZmxrYXNqZndlMDh1MTg1dTEydzNvcWV3LGRtZi5zbXhh"

With the line feed(s) removed from the encoded string it is also not written to the file.

check if string is base64

You can use something like this, not very performant but you are guaranteed not to get false positives:

require 'base64'

def base64?(value)
value.is_a?(String) && Base64.strict_encode64(Base64.decode64(value)) == value
end

The use of strict_encode64 versus encode64 prevents Ruby from inadvertently inserting newlines if you have a long string. See this post for details.

What's padded Base64 encoded strings and how can i generate them in ruby?

The padding they are talking about is actually part of Base64 itself. It's the "=" and "==" at the end. Base64 encodes packets of 3 bytes into 4 encoded characters. So if your input data has length n and

  • n % 3 = 1 => "==" at the end for padding
  • n % 3 = 2 => "=" at the end for padding

No need for you to change your code.

base64 encode length parameter

RFC 4648: The Base16, Base32, and Base64 Data Encodings has this to say:

3.3. Interpretation of Non-Alphabet Characters in Encoded Data

[...]

Implementations MUST reject the encoded data if it contains
characters outside the base alphabet when interpreting base-encoded
data, unless the specification referring to this document explicitly
states otherwise. Such specifications may instead state, as MIME
does, that characters outside the base encoding alphabet should
simply be ignored when interpreting data ("be liberal in what you
accept"). Note that this means that any adjacent carriage return/
line feed (CRLF) characters constitute "non-alphabet characters" and
are ignored.

So the newlines are fine and pretty much everything will ignore them even if they're not strictly compliant with RFC 4648.

Also, the fine manual has this to say:

encode64(bin)

Returns the Base64-encoded version of bin. This method complies with RFC 2045. Line feeds are added to every 60 encoded charactors [sic].

So the 60 character line length is intentional and specified. If you want strict RFC 4648 Base64 (i.e. no newlines), then there is strict_encode64:

strict_encode64(bin)

Returns the Base64-encoded version of bin. This method complies with RFC 4648. No line feeds are added.

So you can say Base64.strict_encode64(val) to get the output you're looking for.

And for reference, here's the relevant section of RFC 2045:

6.8. Base64 Content-Transfer-Encoding

[...]

The encoded output stream must be represented in lines of no more
than 76 characters each. All line breaks or other characters not
found in Table 1 must be ignored by decoding software.

So the 60 character line length is somewhat arbitrary but compliant with RFC 2045 since 60 < 76.

Base64 decode gives different results in Java and Ruby

The result of decoding base64 is binary data. You shouldn't really try to print it as if it were text.

Without knowing Ruby, I'd expect the result of calling Base64.decode64 to be some sort of byte array... and that could be converted into text in any number of ways.

Look at the bytes of what's returned to find out whether or not it's correct.

(It's unfortunate that as far as I can see, the documentation for Base64.decode64 gives examples of exactly the kind of thing you're doing - treating the result of a base64 decode operation as text. It's not clear what type of data is actually returned. This sort of thing is why I still like statically typed languages...)

Is it ok to remove newline in Base64 encoding

Breaking a base64 encoded string into multiple lines has been necessary for many old programs that couldn't handle long lines. Programs written in Java can usually handle long lines since they don't need to do the memory management themselves. As long as your lines are shorter than 64 million characters there should be no problem.

And since you don't need the newlines, you shouldn't generate them at all, if possible.

String replace with strange characters when attributed to a HASH

There is no “error in codification.”

"Caixa Econ\xC3\xB4mica Federal" == "Caixa Econômica Federal"
#⇒ true

For some reason when printing out a hash, ruby uses this representation (I cannot reproduce it though,) but in a nutshell the string you see is good enough.

Some base64 usage in ruby. Is this a good use case?

You’re looking at the table backwards. When encoding 'food', you’re first converting the text to bytes with some encoding, like UTF-8:

'food'.encoding
=> #<Encoding:UTF-8>

'food'.bytes
=> [102, 111, 111, 100]

Then the bits of those bytes are separated into 6-bit groups:

[102, 111, 111, 100].map {|b| sprintf('%08b', b) }
=> ["01100110", "01101111", "01101111", "01100100"]

_.join.scan /.{1,6}/
=> ["011001", "100110", "111101", "101111", "011001", "00"]

_.map {|g| g.to_i(2) }
=> [25, 38, 61, 47, 25, 0]

These are the numbers you would look up in the table to get the base64-encoded letters. In other words: the f in food and the f in the table are unrelated.

With that out of the way: base64 is used to convert arbitrary bytes to text for situations when text is what’s required. You probably don’t need to base64-encode any braces, because HTTP can handle arbitrary bytes in request and response bodies just fine; you will, however, run into trouble trying to JSON-encode arbitrary bytes in Ruby. This is where base64 encoding comes in handy – you can encode the values of payload before serializing it.

encoded_payload = {
sensitive_information: Base64.encode64(payload[:sensitive_information]),
iv: Base64.encode64(payload[:iv]),
}

Now JSON.dump will work fine.



Related Topics



Leave a reply



Submit