Filter duplicate rows based on a field
Probably the easiest way would be to use ROW_NUMBER
and PARTITION BY
SELECT * FROM (
SELECT b.*,
ROW_NUMBER() OVER (PARTITION BY BillID ORDER BY Lang) as num
FROM Bills b
WHERE Account = 'abcd'
) tbl
WHERE num = 1
SQL - filter duplicate rows based on a value in a different column
In the absence of further information, the two queries below assume that you want to resolve duplicate positions by taking either the larger (maximum) user
value, in the first case, or the smaller (minimum) user
value in the second case.
First query:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT position, MAX(user) AS max_user
FROM yourTable
GROUP BY position
) t2
ON t1.position = t2.position AND
t1.user = t2.max_user
Second query:
SELECT t1.*
FROM yourTable t1
INNER JOIN
(
SELECT position, MIN(user) AS min_user
FROM yourTable
GROUP BY position
) t2
ON t1.position = t2.position AND
t1.user = t2.min_user
Remove duplicate rows based on field in a select query with PostgreSQL?
Use DISTINCT ON
:
SELECT DISTINCT ON (contenthash)
id,
contenthash,
filesize,
to_timestamp(timecreated) :: DATE
FROM mdl_files
ORDER BY contenthash, timecreated, id;
DISTINCT ON
is a Postgres extension that makes sure that returns one row for each unique combination of the keys in parentheses. The specific row is the first one found based on the order by
clause.
Remove duplicate rows based on values from one column
You can create a temporary table. In the below example this is called #newtable. The hashtag is important as this is actually what makes it a 'temporary' table (not everyone explains this).
The below might prove useful to others as it includes WHERE conditions which most examples do not have online:
-- First create your temp table
SELECT CONVERT(DATE,a.ins_timestamp) AS 'Date',
a.Prod_code,
a.Curr_boxes,
a.Label_barcode,
b.From_ord_no,
NULL AS To_ord_no,
CASE
WHEN a.From_batch >= a.To_batch THEN a.From_batch
WHEN a.To_batch >= a.From_batch THEN a.To_batch
ELSE a.From_batch
END AS 'Batch',
a.Weight,
'IN' AS 'Direction'
INTO #newtable
FROM a
JOIN b ON a.Label_barcode = b.Label_barcode
WHERE (a.ins_timestamp Between ? And ?) AND (a.To_batch = ?) AND (a.From_batch = 0) AND (a.Type='Consumption') AND (a.To_status<>'STOCK') AND (b.From_status = 'PORDER')
-- Now we insert the second query into the already created table
INSERT INTO #newtable
SELECT CONVERT(DATE,b.ins_timestamp) AS 'Date',
b.Prod_code,
b.Curr_boxes,
b.Label_barcode,
NULL AS From_ord_no,
NULL AS To_ord_no,
CASE
WHEN b.From_batch >= b.To_batch THEN b.From_batch
WHEN b.To_batch >= b.From_batch THEN b.To_batch
ELSE b.From_batch
END AS 'Batch',
b.Weight,
'IN' AS 'Direction'
FROM b
WHERE (b.From_batch = 0) AND (b.Type='Consumption') AND (b.ins_timestamp Between ? And ?) AND (b.To_batch = ?) AND (b.To_status<>'STOCK')
-- Now we can select whatever we want from our temp table
SELECT Date,
Prod_code,
Curr_boxes,
Label_barcode,
max(From_ord_no) From_ord_no,
To_ord_no,
Batch,
Weight,
Direction
FROM #newtable
GROUP BY Date,
Prod_code,
Curr_boxes,
Label_barcode,
To_ord_no,
Batch,
Weight,
Direction
Find All Unique rows based on single column and exclude all duplicate rows
As you probably realised, unique
and duplicated
don’t quite what you need, because they essentially cause the retention of all distinct values, and just collapse “multiple copies” of such values.
For your first question, you can group_by
the column that you’re interested in, and then retain just those groups (via filter
) which have more than one row:
mtcars %>%
group_by(mpg) %>%
filter(length(mpg) > 1) %>%
ungroup()
This example selects all rows for which the mpg
value is duplicated. This works because, when applied to groups, dplyr operations such as filter
work on each group individually. This means that length(mpg)
in the above code will return the length of the mpg
column vector of each group, separately.
To invert the logic, it’s enough to invert the filtering condition:
mtcars %>%
group_by(mpg) %>%
filter(length(mpg) == 1) %>%
ungroup()
Remove duplicate rows based on a value in a field
Seems pretty straightforward to extract these values:
select distinct a,
min(b) b
from t
group by a;
Fiddle for example: http://sqlfiddle.com/#!9/bc4c9/3
You should be able to adapt a removal method from this.
Filter and display all duplicated rows based on multiple columns in Pandas
The following code works, by adding keep = False
:
df = df[df.duplicated(subset = ['month', 'year'], keep = False)]
df = df.sort_values(by=['name', 'month', 'year'], ascending = False)
remove duplicate rows based on specific criteria with pandas
First create a masking to separate duplicate and non-duplicate rows based on Id
, then concatenate non-duplicate slice with duplicate slice without all row values equal to 0.
>>> duplicateMask = df.duplicated('Id', keep=False)
>>> pd.concat([df.loc[duplicateMask & df[['Sales', 'Rent', 'Rate']].ne(0).any(axis=1)],
df[~duplicateMask]])
Id Name Sales Rent Rate
0 40808 A2 0 43 340
1 17486 DV 491 0 346
4 27977 A-M 0 0 94
6 80210 M-1 0 0 -37
7 15545 M-2 0 0 -17
10 53549 A-M8 0 0 50
12 66666 MK 0 0 0
remove duplicate row based on conditional matching in another column
I think the following solution will help you:
library(dplyr)
df %>%
group_by(county, mid) %>%
mutate(duplicate = n() > 1) %>%
filter(!duplicate | (duplicate & kpi == "B")) %>%
select(-duplicate)
# A tibble: 71 x 3
# Groups: county, mid [71]
county mid kpi
<chr> <chr> <chr>
1 Athens 1.1 A
2 Athens 1.2 A
3 Athens 1.3 A
4 Athens 1.4 A
5 Athens 1.5 A
6 Athens 1.6 A
7 Athens 2.1.1 A
8 Athens 2.1.2 A
9 Athens 2.1.3 A
10 Athens 2.1.4 A
# ... with 61 more rows
Related Topics
Differencebetween Prepared Statements and SQL or Pl/Pgsql Functions, in Terms of Their Purpose
How to Find Duplicate Consecutive Values in This Table
Oracle Trigger Ora-04098: Trigger Is Invalid and Failed Re-Validation
Enable Full-Text Search on View with Inner Join
Paging with Oracle and SQL Server and Generic Paging Method
Difference Between Inner Join and Where in Select Join SQL Statement
Does Liquibase Support Dry Run
Odd SQL Server 2012 Identity Issue
Distance Between Two Coordinates, How to Simplify This And/Or Use a Different Technique
Bulk Load Data Conversion Error (Truncation)
How to Open Bcp Host Data-File
Insert into a Row at Specific Position into SQL Server Table with Pk
Java.Sql.Sqlexception: Ora-03115: Unsupported Network Datatype or Representation
#1222 - the Used Select Statements Have a Different Number of Columns
Rails, Ransack: How to Search Habtm Relationship for "All" Matches Instead of "Any"
Disable SQL Cache Temporary in Rails