Best way to remove duplicate entries from a data table
Remove Duplicates
public DataTable RemoveDuplicateRows(DataTable dTable, string colName)
{
Hashtable hTable = new Hashtable();
ArrayList duplicateList = new ArrayList();
//Add list of all the unique item value to hashtable, which stores combination of key, value pair.
//And add duplicate item value in arraylist.
foreach (DataRow drow in dTable.Rows)
{
if (hTable.Contains(drow[colName]))
duplicateList.Add(drow);
else
hTable.Add(drow[colName], string.Empty);
}
//Removing a list of duplicate items from datatable.
foreach (DataRow dRow in duplicateList)
dTable.Rows.Remove(dRow);
//Datatable which contains unique records will be return as output.
return dTable;
}
Here Links below
http://www.dotnetspider.com/resources/4535-Remove-duplicate-records-from-table.aspx
http://www.dotnetspark.com/kb/94-remove-duplicate-rows-value-from-datatable.aspx
For remove duplicates in column
http://dotnetguts.blogspot.com/2007/02/removing-duplicate-records-from.html
What is the best way to remove duplicates from a datatable?
You can use Linq to Datasets. Check this. Something like this:
// Fill the DataSet.
DataSet ds = new DataSet();
ds.Locale = CultureInfo.InvariantCulture;
FillDataSet(ds);
List<DataRow> rows = new List<DataRow>();
DataTable contact = ds.Tables["Contact"];
// Get 100 rows from the Contact table.
IEnumerable<DataRow> query = (from c in contact.AsEnumerable()
select c).Take(100);
DataTable contactsTableWith100Rows = query.CopyToDataTable();
// Add 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
// Create duplicate rows by adding the same 100 rows to the list.
foreach (DataRow row in contactsTableWith100Rows.Rows)
rows.Add(row);
DataTable table =
System.Data.DataTableExtensions.CopyToDataTable<DataRow>(rows);
// Find the unique contacts in the table.
IEnumerable<DataRow> uniqueContacts =
table.AsEnumerable().Distinct(DataRowComparer.Default);
Console.WriteLine("Unique contacts:");
foreach (DataRow uniqueContact in uniqueContacts)
{
Console.WriteLine(uniqueContact.Field<Int32>("ContactID"));
}
What's The Best Way To Remove Duplicate Rows Based On A Row Item In A DataTable With VB.Net?
Assuming you want to delete the rows from the original table, and not create a new table, you can use LINQ to find the rows to delete, then delete them. LINQ is for querying data, not modifying it.
Dim indicesOfRowsToDelete = dt.AsEnumerable _
.Select(Function(r, n) New With { Key r, Key n }) _
.GroupBy(Function(rn) New With { Key .OrderNumber = rn.r.Field(Of String)("OrderNumber"), Key .RequestType = rn.r.Field(Of String)("RequestType") }) _
.Where(Function(rg) rg.Key.RequestType = "Cancel Order") _
.SelectMany(Function(rg) rg.Skip(1).Select(Function(rn) rn.n)) _
.OrderByDescending(Function(n) n)
For Each n In indicesOfRowsToDelete
dt.Rows(n).Delete
Next
Here is the C# version of the same code:
var indicesOfRowsToDelete = dt.AsEnumerable()
.Select((r, n) => new { r, n })
.GroupBy(rn => new { OrderNumber = rn.r.Field<string>("OrderNumber"), RequestType = rn.r.Field<string>("RequestType") })
.Where(rg => rg.Key.RequestType == "Cancel Order")
.SelectMany(rg => rg.Skip(1).Select(rn => rn.n))
.OrderByDescending(n => n);
foreach (var n in indicesOfRowsToDelete)
dt.Rows[n].Delete();
However, since your posted solution creates a new table with the desired rows, here is a LINQ query to create a new DataTable
in C#:
var newDT = dt.AsEnumerable()
.GroupBy(r => new { OrderNumber = r.Field<string>("OrderNumber"), RequestType = r.Field<string>("RequestType") })
.SelectMany(rg => rg.Key.RequestType == "Cancel Order"
? rg.Take(1) // or other selection process
: rg
)
.CopyToDataTable();
Removal of Duplicate Rows from Data table Based on Multiple columns
Method 3:
Instead of Trying above 2 methods i found this Linq Method something very useful
dt_Barcode = dt_Barcode.AsEnumerable().GroupBy(r => new { ItemID = r.Field<Int64>("ItemID"), PacktypeId = r.Field<Int32>("PackTypeID") }).Select(g => g.First()).CopyToDataTable();
Best way to remove duplicates from DataTable depending on column values
I think this can be done more efficiently. You copy the DataSet once with DataSet importedDataCopy = importedData.Copy();
and then you copy it again into a dictionary and then you delete the unnecessary data from the dictionary. I would rather just remove the unnecessary information in one pass. What about something like this:
private void mergeduplicate(DataSet importedData)
{
Dictionary<String, DataRow> systems = new Dictionary<String, DataRow>();
int i = 0;
while (i < importedData.Tables[0].Rows.Count)
{
DataRow dr = importedData.Tables[0].Rows[i];
String systemName = dr["Computer Name"].ToString();
if (!systems.ContainsKey(systemName))
{
systems.Add(systemName, dr);
}
else
{
// Existing date is the date in the dictionary.
DateTime existing = Validation.ConvertStringIntoDateTime(systems[systemName]["date"].ToString());
// Candidate date is the date of the current DataRow.
DateTime candidate = Validation.ConvertStringIntoDateTime(dr["date"].ToString());
// If the candidate date is greater than the existing date then replace the existing DataRow
// with the candidate DataRow and delete the existing DataRow from the table.
if (DateTime.Compare(existing, candidate) < 0)
{
importedData.Tables[0].Rows.Remove(systems[systemName]);
systems[systemName] = dr;
}
else
{
importedData.Tables[0].Rows.Remove(dr);
}
}
i++;
}
}
How to remove all duplicated rows in data.table in r
We group by 'ID', get a logical index with duplicated
on the 'Date', and negate so that all the unique elements are now TRUE, use .I
to get the row index, extract the index column 'V1' and use that to subset 'dt'.
dt[dt[, .I[!(duplicated(Date)|duplicated(Date, fromLast=TRUE))], ID]$V1]
# Date ID INC
#1: 201505 500 80
#2: 201504 600 50
Or another option would be to group by 'Date', 'ID' and if
the nrow is equal to 1 (.N==1
), we get the Subset of Data.table (.SD
).
dt[, if(.N==1) .SD, .(Date, ID)]
# Date ID INC
#1: 201504 600 50
#2: 201505 500 80
Or as @Frank mentioned, we can use a data.table/base R combo
DT[ave(seq(.N), Date, ID, FUN = function(x) length(x) == 1L)]
Filtering out duplicated/non-unique rows in data.table
For v1.9.8+ (released November 2016)
From ?unique.data.table
By default all columns are being used (which is consistent with ?unique.data.frame
)
unique(dt)
V1 V2
1: A B
2: A C
3: A D
4: B A
5: C D
6: E F
7: G G
Or using the by
argument in order to get unique combinations of specific columns (like previously keys were used for)
unique(dt, by = "V2")
V1 V2
1: A B
2: A C
3: A D
4: B A
5: E F
6: G G
Prior v1.9.8
From ?unique.data.table
, it is clear that calling unique
on a data table only works on the key. This means you have to reset the key to all columns before calling unique
.
library(data.table)
dt <- data.table(
V1=LETTERS[c(1,1,1,1,2,3,3,5,7,1)],
V2=LETTERS[c(2,3,4,2,1,4,4,6,7,2)]
)
Calling unique
with one column as key:
setkey(dt, "V2")
unique(dt)
V1 V2
[1,] B A
[2,] A B
[3,] A C
[4,] A D
[5,] E F
[6,] G G
Which way is faster to delete duplicate rows in sql?
Not having a primary key for your table is a general bad idea. Here is one way you can delete duplicates, with the record retained per 23 columns is arbitrary:
WITH cte AS (
SELECT *, ROW_NUMBER() OVER (PARTITION BY col1, col2, col3, ..., col22, col23
ORDER BY (SELECT NULL)) rn
FROM yourTable
)
DELETE
FROM cte
WHERE rn > 1;
Related Topics
Where Does System.Diagnostics.Debug.Write Output Appear
Writing to a Textbox from Another Thread
Reading File Input from a Multipart/Form-Data Post
Change System Date Programmatically
Richtextbox (Wpf) Does Not Have String Property "Text"
How to Parse JSON Without JSON.Net Library
No Generic Implementation of Ordereddictionary
Configuration System Failed to Initialize
How to Insert an Image into a Richtextbox
Best Way to Resolve File Path Too Long Exception
Equivalent to 'App.Config' for a Library (Dll)
Why Does This Async Action Hang When I Try and Access the Result Property of My Task
What Is Cool About Generics, Why Use Them