Compare Two Datatables to Determine Rows in One But Not the Other

Compare two DataTables to determine rows in one but not the other

would I have to iterate through each row on each DataTable to check if they are the same.

Seeing as you've loaded the data from a CSV file, you're not going to have any indexes or anything, so at some point, something is going to have to iterate through every row, whether it be your code, or a library, or whatever.

Anyway, this is an algorithms question, which is not my specialty, but my naive approach would be as follows:

1: Can you exploit any properties of the data? Are all the rows in each table unique, and can you sort them both by the same criteria? If so, you can do this:

  • Sort both tables by their ID (using some useful thing like a quicksort). If they're already sorted then you win big.
  • Step through both tables at once, skipping over any gaps in ID's in either table. Matched ID's mean duplicated records.

This allows you to do it in (sort time * 2 ) + one pass, so if my big-O-notation is correct, it'd be (whatever-sort-time) + O(m+n) which is pretty good.

(Revision: this is the approach that ΤΖΩΤΖΙΟΥ describes )

2: An alternative approach, which may be more or less efficient depending on how big your data is:

  • Run through table 1, and for each row, stick it's ID (or computed hashcode, or some other unique ID for that row) into a dictionary (or hashtable if you prefer to call it that).
  • Run through table 2, and for each row, see if the ID (or hashcode etc) is present in the dictionary. You're exploiting the fact that dictionaries have really fast - O(1) I think? lookup. This step will be really fast, but you'll have paid the price doing all those dictionary inserts.

I'd be really interested to see what people with better knowledge of algorithms than myself come up with for this one :-)

Compare two DataTables and select the rows that are not present in second table

You can use Linq, especially Enumerable.Except helps to find id's in TableA that are not in TableB:

var idsNotInB = TableA.AsEnumerable().Select(r => r.Field<int>("id"))
.Except(TableB.AsEnumerable().Select(r => r.Field<int>("id")));
DataTable TableC = (from row in TableA.AsEnumerable()
join id in idsNotInB
on row.Field<int>("id") equals id
select row).CopyToDataTable();

You can also use Where but it'll be less efficient:

DataTable TableC = TableA.AsEnumerable()
.Where(ra => !TableB.AsEnumerable()
.Any(rb => rb.Field<int>("id") == ra.Field<int>("id")))
.CopyToDataTable();

Comparing two datatables in C# and finding new, matching and non-macting records

You can try with the Linq methods which are available for Enumerable types like Intersect, Except. Here is an example of doing this.

// Get matching rows from the two tables
IEnumerable<DataRow> matchingRows = table1.AsEnumerable().Intersect(table2.AsEnumerable());

// Get rows those are present in table2 but not in table1
IEnumerable<DataRow> rowsNotInTableA = table2.AsEnumerable().Except(table1.AsEnumerable());

Comparing two DataTables to determine if it is modified

Using foreach loop within another foreach is N X N comparison, that you don't need to do.

Comparing First row with First row of other table, second with second and so on using Zip extension method is very useful for this case.

DataTable original;
DataTable modified;

// your stuff

modified = modified.AsEnumerable().Zip<DataRow, DataRow, DataRow>(original.AsEnumerable(), (DataRow modif, DataRow orig) =>
{
if (!orig.ItemArray.SequenceEqual<object>(modif.ItemArray))
{
modif.SetModified();
}
return modif;
}).CopyToDataTable<DataRow>();


Related Topics



Leave a reply



Submit