How to Highlight The Differences Between Subsequent Lines in a File

How to highlight the differences between subsequent lines in a file?

I wrote a Python script for this purpose that utilizes difflib.SequenceMatcher:

#!/usr/bin/python3

from difflib import SequenceMatcher
from itertools import tee
from sys import stdin

def pairwise(iterable):
"""s -> (s0,s1), (s1,s2), (s2, s3), ...

https://docs.python.org/3/library/itertools.html#itertools-recipes
"""
a, b = tee(iterable)
next(b, None)
return zip(a, b)

def color(c, s):
"""Wrap string s in color c.

Based on http://stackoverflow.com/a/287944/1916449
"""
try:
lookup = {'r':'\033[91m', 'g':'\033[92m', 'b':'\033[1m'}
return lookup[c] + str(s) + '\033[0m'
except KeyError:
return s

def diff(a, b):
"""Returns a list of paired and colored differences between a and b."""
for tag, i, j, k, l in SequenceMatcher(None, a, b).get_opcodes():
if tag == 'equal': yield 2 * [color('w', a[i:j])]
if tag in ('delete', 'replace'): yield color('r', a[i:j]), ''
if tag in ('insert', 'replace'): yield '', color('g', b[k:l])

if __name__ == '__main__':
for a, b in pairwise(stdin):
print(*map(''.join, zip(*diff(a, b))), sep='')

Example input.txt:

108  finished   /tmp/ts-out.5KS8bq   0       435.63/429.00/6.29 ./eval.exe -z 30
107 finished /tmp/ts-out.z0tKmX 0 456.10/448.36/7.26 ./eval.exe -z 30
110 finished /tmp/ts-out.wrYCrk 0 0.00/0.00/0.00 tail -n 1
111 finished /tmp/ts-out.HALY18 0 460.65/456.02/4.47 ./eval.exe -z 30
112 finished /tmp/ts-out.6hdkH5 0 292.26/272.98/19.12 ./eval.exe -z 1000
113 finished /tmp/ts-out.eFBgoG 0 837.49/825.82/11.34 ./eval.exe -z 10

Output of cat input.txt | ./linediff.py:

linediff output

Linux differences between consecutive lines

I'm not sure if I got you correctly, but the following awk script should work:

awk '{if(NR>1){print _n-$4};_n=$4}' your.file

Output:

23
24
21
21
22
25
18
21
43
21

You don't need the other programs in the pipe. Just:

awk '/\/dev\/vda/ {if(c++>0){print _n-$4};_n=$4}' src/checkout-plugin/a.txt 

will be enough. The regex on start of the awk scripts tells awk to apply the following block only to lines which match the pattern. A side effect is that NR can't be used anymore to detect the "second line" in which the calculation starts. I introduced a custome counter c for that purpose.

Also note that awk will remove the M on it's own, because the column has been used in a numeric calculation.

showing differences within a line in diff output

I don't know if this is sufficiently command line for your purpose, but vimdiff can do this (even does colour). See for example the image in this related question.

Compare two files line by line and generate the difference in another file

diff(1) is not the answer, but comm(1) is.

NAME
comm - compare two sorted files line by line

SYNOPSIS
comm [OPTION]... FILE1 FILE2

...

-1 suppress lines unique to FILE1

-2 suppress lines unique to FILE2

-3 suppress lines that appear in both files

So

comm -2 -3 file1 file2 > file3

The input files must be sorted. If they are not, sort them first. This can be done with a temporary file, or...

comm -2 -3 <(sort file1) <(sort file2) > file3

provided that your shell supports process substitution (bash does).

Fast way of finding lines in one file that are not in another?

You can achieve this by controlling the formatting of the old/new/unchanged lines in GNU diff output:

diff --new-line-format="" --unchanged-line-format=""  file1 file2

The input files should be sorted for this to work. With bash (and zsh) you can sort in-place with process substitution <( ):

diff --new-line-format="" --unchanged-line-format="" <(sort file1) <(sort file2)

In the above new and unchanged lines are suppressed, so only changed (i.e. removed lines in your case) are output. You may also use a few diff options that other solutions don't offer, such as -i to ignore case, or various whitespace options (-E, -b, -v etc) for less strict matching.


Explanation

The options --new-line-format, --old-line-format and --unchanged-line-format let you control the way diff formats the differences, similar to printf format specifiers. These options format new (added), old (removed) and unchanged lines respectively. Setting one to empty "" prevents output of that kind of line.

If you are familiar with unified diff format, you can partly recreate it with:

diff --old-line-format="-%L" --unchanged-line-format=" %L" \
--new-line-format="+%L" file1 file2

The %L specifier is the line in question, and we prefix each with "+" "-" or " ", like diff -u
(note that it only outputs differences, it lacks the --- +++ and @@ lines at the top of each grouped change).
You can also use this to do other useful things like number each line with %dn.


The diff method (along with other suggestions comm and join) only produce the expected output with sorted input, though you can use <(sort ...) to sort in place. Here's a simple awk (nawk) script (inspired by the scripts linked-to in Konsolebox's answer) which accepts arbitrarily ordered input files, and outputs the missing lines in the order they occur in file1.

# output lines in file1 that are not in file2
BEGIN { FS="" } # preserve whitespace
(NR==FNR) { ll1[FNR]=$0; nl1=FNR; } # file1, index by lineno
(NR!=FNR) { ss2[$0]++; } # file2, index by string
END {
for (ll=1; ll<=nl1; ll++) if (!(ll1[ll] in ss2)) print ll1[ll]
}

This stores the entire contents of file1 line by line in a line-number indexed array ll1[], and the entire contents of file2 line by line in a line-content indexed associative array ss2[]. After both files are read, iterate over ll1 and use the in operator to determine if the line in file1 is present in file2. (This will have have different output to the diff method if there are duplicates.)

In the event that the files are sufficiently large that storing them both causes a memory problem, you can trade CPU for memory by storing only file1 and deleting matches along the way as file2 is read.

BEGIN { FS="" }
(NR==FNR) { # file1, index by lineno and string
ll1[FNR]=$0; ss1[$0]=FNR; nl1=FNR;
}
(NR!=FNR) { # file2
if ($0 in ss1) { delete ll1[ss1[$0]]; delete ss1[$0]; }
}
END {
for (ll=1; ll<=nl1; ll++) if (ll in ll1) print ll1[ll]
}

The above stores the entire contents of file1 in two arrays, one indexed by line number ll1[], one indexed by line content ss1[]. Then as file2 is read, each matching line is deleted from ll1[] and ss1[]. At the end the remaining lines from file1 are output, preserving the original order.

In this case, with the problem as stated, you can also divide and conquer using GNU split (filtering is a GNU extension), repeated runs with chunks of file1 and reading file2 completely each time:

split -l 20000 --filter='gawk -f linesnotin.awk - file2' < file1

Note the use and placement of - meaning stdin on the gawk command line. This is provided by split from file1 in chunks of 20000 line per-invocation.

For users on non-GNU systems, there is almost certainly a GNU coreutils package you can obtain, including on OSX as part of the Apple Xcode tools which provides GNU diff, awk, though only a POSIX/BSD split rather than a GNU version.

How to diff two lines in an open file in vim?

A quick and dirty solution is to just select both lines and sort them while removing duplicates:

  • select lines
  • ":sort u"
  • if only one line remains, both were equal
  • if both remain, there most be some difference

An undo recovers everything again.

find difference between two text files with one item per line

You can try

grep -f file1 file2

or

grep -v -F -x -f file1 file2

Find the Difference between two rows and Highlight the difference then loop through all Rows with the Same Code

Try:

FindVariance Macro

For j=2 to Range("A1").End(xlDown).Row-1
i=j+1
Rows(j & ":" & i).ColumnDifferences(Range("A" & i)).Offset(1,0).Select
With Selection.Interior
.Pattern = xlSolid
.PatternColorIndex = xlAutomatic
.Color = 15773696
.TintAndShade = 0
.PatternTintAndShade = 0
End With
j=j+1
Next j

End Sub


Related Topics



Leave a reply



Submit