Remove all unique rows
Another option:
subset(df,duplicated(col1) | duplicated(col1, fromLast=TRUE))
How to drop unique rows in a pandas dataframe?
Solutions for select all duplicated rows:
You can use duplicated
with subset and parameter keep=False
for select all duplicates:
df = df[df.duplicated(subset=['A','B'], keep=False)]
print (df)
A B C
1 foo 1 A
2 foo 1 B
Solution with transform
:
df = df[df.groupby(['A', 'B'])['A'].transform('size') > 1]
print (df)
A B C
1 foo 1 A
2 foo 1 B
A bit modified solutions for select all unique rows:
#invert boolean mask by ~
df = df[~df.duplicated(subset=['A','B'], keep=False)]
print (df)
A B C
0 foo 0 A
3 bar 1 A
df = df[df.groupby(['A', 'B'])['A'].transform('size') == 1]
print (df)
A B C
0 foo 0 A
3 bar 1 A
How can I remove all duplicates so that NONE are left in a data frame?
This will extract the rows which appear only once (assuming your data frame is named df
):
df[!(duplicated(df) | duplicated(df, fromLast = TRUE)), ]
How it works: The function duplicated
tests whether a line appears at least for the second time starting at line one. If the argument fromLast = TRUE
is used, the function starts at the last line.
Boths boolean results are combined with |
(logical 'or') into a new vector which indicates all lines appearing more than once. The result of this is negated using !
thereby creating a boolean vector indicating lines appearing only once.
Removing duplicate rows in Notepad++
Notepad++ with the TextFX plugin can do this, provided you wanted to sort by line, and remove the duplicate lines at the same time.
To install the TextFX in the latest release of Notepad++ you need to download it from here: https://sourceforge.net/projects/npp-plugins/files/TextFX
The TextFX plugin used to be included in older versions of Notepad++, or be possible to add from the menu by going to Plugins -> Plugin Manager -> Show Plugin Manager -> Available tab -> TextFX -> Install
. In some cases it may also be called TextFX Characters
, but this is the same thing.
The check boxes and buttons required will now appear in the menu under: TextFX -> TextFX Tools
.
Make sure "sort outputs only unique..." is checked. Next, select a block of text (Ctrl+A to select the entire document). Finally, click "sort lines case sensitive" or "sort lines case insensitive"
how do I remove rows with duplicate values of columns in pandas data frame?
Using drop_duplicates
with subset
with list of columns to check for duplicates on and keep='first'
to keep first of duplicates.
If dataframe
is:
df = pd.DataFrame({'Column1': ["'cat'", "'toy'", "'cat'"],
'Column2': ["'bat'", "'flower'", "'bat'"],
'Column3': ["'xyz'", "'abc'", "'lmn'"]})
print(df)
Result:
Column1 Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'
2 'cat' 'bat' 'lmn'
Then:
result_df = df.drop_duplicates(subset=['Column1', 'Column2'], keep='first')
print(result_df)
Result:
Column1 Column2 Column3
0 'cat' 'bat' 'xyz'
1 'toy' 'flower' 'abc'
Remove duplicated rows
just isolate your data frame to the columns you need, then use the unique function :D
# in the above example, you only need the first three columns
deduped.data <- unique( yourdata[ , 1:3 ] )
# the fourth column no longer 'distinguishes' them,
# so they're duplicates and thrown out.
Get all unique values in a JavaScript array (remove duplicates)
With JavaScript 1.6 / ECMAScript 5 you can use the native filter
method of an Array in the following way to get an array with unique values:
function onlyUnique(value, index, self) {
return self.indexOf(value) === index;
}
// usage example:
var a = ['a', 1, 'a', 2, '1'];
var unique = a.filter(onlyUnique);
console.log(unique); // ['a', 1, 2, '1']
Related Topics
Standard Eval with Ggplot2 Without 'Aes_String()'
Plotly - Different Colours for Different Surfaces
Error Using T.Test() in R - Not Enough 'Y' Observations
R - Check If String Contains Dates Within Specific Date Range
Uri Routing for Shinydashboard Using Shiny.Router
Scraping JavaScript Generated Data
Multiplying Combinations of a List of Lists in R
How to Place an Identical Smooth on Each Facet of a Ggplot2 Object
Error in Running Factor() on a Column of a Data Frame
How to Filter Rows Based on the Previous Row and Keep Previous Row Using Dplyr
Error When Mapping in Ggmap with API Key (403 Forbidden)
How to Paste List of Items in R
R - Identify Consecutive Sequences
Compare Two Columns Element-Wise
My Group by Doesn't Appear to Be Working in Disk Frames
How to Extend the 'Summary' Function to Include Sd, Kurtosis and Skew