How to Compare 3 Files Together (To See What Is in Common Between Them)

How can I compare 3 files together (to see what is in common between them)?

If it's simply to print out the pairs (column1 + column2) that are common in all 3 files, and making use of the fact that a pair is unique within a file, you could do it this way:

awk '{print $1" "$2}' a b c | sort | uniq -c | awk '{if ($1==3){print $2" "$3}}'

This can be made with arbitrary numbers of files as long as you modify the param of the last command.

Here's what it does:

  1. prints and sorts the first 2 columns of all files (awk '{print $1" "$2}' a b c | sort)
  2. count the number of duplicate entries (uniq -c)
  3. if duplicate entry count == number of files, we found a match. print it.

If you're doing this often, you can express it as a bash function (and drop it in your .bashrc) which parametrises the file counts.

function common_pairs { 
awk '{print $1" "$2}' $@ | sort | uniq -c | awk -v numf=$# '{if ($1==numf){print $2" "$3}}';
}

Call it with any number of files you want: common_pairs file1 file2 file3 fileN

Compare two different files line by line in python

This solution reads both files in one pass, excludes blank lines, and prints common lines regardless of their position in the file:

with open('some_file_1.txt', 'r') as file1:
with open('some_file_2.txt', 'r') as file2:
same = set(file1).intersection(file2)

same.discard('\n')

with open('some_output_file.txt', 'w') as file_out:
for line in same:
file_out.write(line)

DIFF utility works for 2 files. How to compare more than 2 files at a time?

Displaying 10 files side-by-side and highlighting differences can be easily done with Diffuse. Simply specify all files on the command line like this:

diffuse 1.txt 2.txt 3.txt 4.txt 5.txt 6.txt 7.txt 8.txt 9.txt 10.txt

Visual Studio Code - is there a Compare feature like that plugin for Notepad ++?

You can compare files from the explorer either from the working files section or the folder section. You can also trigger the global compare action from the command palette.

  1. Open a folder with the files you need to compare,
  2. Select two using SHIFT
  3. Right click "Compare Selected"
    Sample Image

Comparing 3 different .csv files in R and extracting the common data between them to a new .csv

You could use dplyr to perform all operations in one pipe.

If you are looking for a solution finding observations that are present in ALL .csv files you should go with inner join:

library(dplyr)
library(magrittr)

read.csv("first.csv") %>%
inner_join(read.csv("second.csv")) %>%
inner_join(read.csv("third.csv")) %>%
write.csv("fourth.csv", quote = F, row.names = F)

If you are looking for a solution that find all observations present in ANY data frame you should go with full join:

read.csv("first.csv") %>%
full_join(read.csv("second.csv")) %>%
full_join(read.csv("third.csv")) %>%
write.csv("fourth.csv", quote = F, row.names = F)


Related Topics



Leave a reply



Submit