What Is the Best Diff Library in Ruby

What is the best Diff library in Ruby?

I looked around and couldn't find an existing gem or library that offered a convenient way to generate diff style output from ruby.

I just released diffy which does what I want. It's a lightweight wrapper around diff which lets you generate text or html diffs from two strings, without a lot of fuss. I hope others find it useful. It's in use on wiff.me for anyone wants to preview the html output.

diff a ruby string or array

diff.rb is what you want, which is available at http://users.cybercity.dk/~dsl8950/ruby/diff.html via internet archive:

http://web.archive.org/web/20140421214841/http://users.cybercity.dk:80/~dsl8950/ruby/diff.html

What is the best (word or character)-based diff algorithm out there?

Here is a ruby gem that does diffing of strings: http://rubydoc.info/gems/diff-lcs/1.1.3/frames

Before hand, I just did (in irb)

require 'rubygems'
require 'diff/lcs'
require 'diff/lcs/array'
require 'diff/lcs/string'

Sample Image

So, writing the logic to insert, inline removed and inserted markers becomes trivial thanks to this 2D diff array of changes.

Though I'm not sure if this is this the best way.

What is the general format of Ruby diff-lcs diff output?

You might have better luck with a better example. If you do this:

Diff::LCS.diff('ab cd', 'a- c_')

Then the output looks like this (with the noise removed):

[
[
<@action="-", @position=1, @element="b">,
<@action="+", @position=1, @element="-">
], [
<@action="-", @position=4, @element="d">,
<@action="+", @position=4, @element="_">
]
]

If we look at Diff::LCS.diff('ab cd ef', 'a- c_ e+'), then we'd get three inner arrays instead of two.

What possible reason could there be for this? There are three operations in a diff:

  1. Add a string.
  2. Remove string.
  3. Change a string.

A change is really just a combination of removes and adds so we're left with just remove and add as the fundamental operations; these line up with the @action values quite nicely. However, when humans look at diffs, we want to see a change as a distinct operation, we want to see that b has become -, the "remove b, add -" version is an implementation detail.

If all we had was this:

[
<@action="-", @position=1, @element="b">,
<@action="+", @position=1, @element="-">,
<@action="-", @position=4, @element="d">,
<@action="+", @position=4, @element="_">
]

then you'd have to figure out which +/- pairs were really changes and which were separate additions and removals.

So the inner arrays map the two fundamental operations (add, remove) to the three operations (add, remove, change) that humans want to see.

You might want to examine the structure of the outputs from these as well:

  • Diff::LCS.diff('ab cd', 'a- x c_')
  • Diff::LCS.diff('ab', 'abx')
  • Diff::LCS.diff('ab', 'xbx')

I think an explicit change @action for Diff::LCS::Change would be better but at least the inner arrays let you group the individual additions and removals into higher level edits.

File Comparison in Ruby Program

I believe that you may use this library: https://github.com/samg/diffy

ruby difference engine

The "standard" solution is Austin Ziegler's diff-lcs library, which – as the name implies – implements a longest common subsequence algorithm. More precisely, the LCS algorithm by McIlroy and Hunt. This library is a port of Mario I. Wolczko's Smalltalk implementation of the McIlroy-Hunt algorithm from 1993 as well es the Algorithm::Diff Perl library.

Unfortunately, there hasn't been a release since 2004. Which wouldn't be that bad, since the McIlroy-Hunt algorithm hasn't changed since 1976, but String handling in Ruby has changed significantly in Ruby 1.9.

difflib on Ruby

After some research, I suggest using amatch or SimMetrics (with JRuby) and manually implement the get_close_matches method. Both libs offer implementations of many string similarity algorithms.



Related Topics



Leave a reply



Submit