Removing duplicates from a String in Java
Convert the string to an array of char, and store it in a LinkedHashSet
. That will preserve your ordering, and remove duplicates. Something like:
String string = "aabbccdefatafaz";
char[] chars = string.toCharArray();
Set<Character> charSet = new LinkedHashSet<Character>();
for (char c : chars) {
charSet.add(c);
}
StringBuilder sb = new StringBuilder();
for (Character character : charSet) {
sb.append(character);
}
System.out.println(sb.toString());
How to remove duplicate chars in a string?
It seems from your example that you want to remove REPEATED SEQUENCES of characters, not duplicate chars across the whole string. So this is what I'm solving here.
You can use a regular expression.. not sure how horribly inefficient it is but it
works.
>>> import re
>>> phrase = str("oo rarato roeroeu aa rouroupa dodo rerei dde romroma")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'o rato roeu a roupa do rei de roma'
How this substitution proceeds down the string:
oo -> o
" " -> " "
rara -> ra
to -> to
" "-> " "
roeroe -> roe
etc..
Edit: Works for the other example string which should not be modified:
>>> phrase = str("Barbara Bebe com Bernardo")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'Barbara Bebe com Bernardo'
remove duplicate character in string and make unique string
Do you like one liners? Minimal code can be very efficient. Compare the following:
With sets and list comprehension:
const remDup= e => [...new Set(e)].sort().join("");console.log(remDup("Rasikawef dfv dd"))
How to remove duplicate letters in string in PHP
I tend to avoid regex when possible. Here, I'd just split all the letters into one big array and then use array_unique()
to de-duplicate:
$unique = array_unique(str_split(implode('', $elem)));
That gives you an array of the unique characters, one character per array element. If you'd prefer those as a string, just implode the array:
$unique = implode('', array_unique(str_split(implode('', $elem))));
Removing duplicates in each item in a list of strings
You can use set
combined with list comprehension if you don't want the letter ordering to be preserved:
list1= ['AAB', 'CAA', 'ADA']
list1 = [''.join(set(l)) for l in list1]
print(list1)
Or use OrderedDict
if you want the ordering to be preserved:
from collections import OrderedDict
list1= ['AAB', 'CAA', 'ADA']
list1 = [''.join(OrderedDict.fromkeys(l).keys()) for l in list1]
print(list1)
Removing duplicate characters in a String array
Your problem is that you are checking each character against all the characters after it. Imagine an array with no duplicates; once you get to the last character, you have printed out all the characters before it. But when j = array.length - 1
, then k = array.length
, and the second for
loop does not run at all, and your last character will never be printed.
Your code would always fail to print the last element correctly. The only case in which it would be correct is if your last element is a duplicate of a previous element, but not of the second-to-last element.
Try this code instead:
outerloop:
for (int j = 0; j < array.length; j++) {
for(int k = 0; k < j; k++) {
if(array[j] == array[k]) {
continue outerloop;
}
}
System.out.print(array[j] + " ");
}
The premise of the code is that it loops through each of the characters. If it matches any of the previous characters, the code skips that element and continues to the next one.
EDIT: Looks like you edited the question for a sorted array instead. That means if the last element is a duplicate, it will be a duplicate of the element before it so we don't have to worry about the corner case in my previous block of code.
for(int j=0; j< array.length; j++) {
for(int k=j+1; k< array.length; k++) {
if(array[j] == array[k]) {
continue;
}
System.out.print(array[j] + " ");
j = k;
}
}
System.out.print(array[array.length-1] + " ");
Removing specific duplicated characters from a string in Python
(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...)
I assume you have a string that represents all the characters you want to de-duplicate. Let's call it to_remove
, and say that it's equal to "_.-". So only underscores, periods, and hyphens will be de-duplicated.
You could use a regex to match multiple successive repeats of a character, and replace them with a single character.
>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'
Quick breakdown:
?P<char>
assigns the symbolic namechar
to the first group.- we put
to_remove
inside the character matching set,[]
. It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise. (?P=char)
refers back to the character matched by the named group "char".- The
+
matches one or more repetitions of that character.
So in aggregate, this means "match any character from to_remove
that appears more than once in a row". The second argument to sub
, r"\1"
, then replaces that match with the first group, which is only one character long.
Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.
>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'
Alternative approach #2: use groupby
to identify consecutive identical character groups, then join the values together, using to_remove
membership testing to decide how many values should be added..
>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'
Alternative approach #3: call re.sub
once for each member of to_remove. A bit expensive if to_remove
contains a lot of characters.
>>> for c in to_remove:
... s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'
How to remove duplicated characters from a string?
You can use a generator
expression and join
like,
>>> x = 'Hello'
>>> ''.join(c for c in x if x.count(c) == 1)
'Heo'
How to remove duplicate characters from a string in Swift
Edit/update: Swift 4.2 or later
You can use a set to filter your duplicated characters:
let str = "bookkeeper"
var set = Set<Character>()
let squeezed = str.filter{ set.insert($0).inserted }
print(squeezed) // "bokepr"
Or as an extension on RangeReplaceableCollection
which will also extend String and Substrings as well:
extension RangeReplaceableCollection where Element: Hashable {
var squeezed: Self {
var set = Set<Element>()
return filter{ set.insert($0).inserted }
}
}
let str = "bookkeeper"
print(str.squeezed) // "bokepr"
print(str[...].squeezed) // "bokepr"
Related Topics
Extending Setuptools Extension to Use Cmake in Setup.Py
How to Take a Screenshot/Image of a Website Using Python
How to Make Sure If Some HTML Elements Are Loaded for Selenium + Python
Change the Color of Text Within a Pandas Dataframe HTML Table Python Using Styles and CSS
How to Find Tag with Particular Text with Beautiful Soup
How to Find All Comments with Beautiful Soup
How to Exit Linux Terminal Using Python Script
How to Use Python2.7 Pip Instead of Default Pip
How to Use the Same Python Virtualenv on Both Windows and Linux
What Is Different Between Makedirs and Mkdir of Os
How to Execute Python File in Linux
Python/Ipython Importerror: No Module Named Site
Authenticate from Linux to Windows SQL Server with Pyodbc
What Is the Pythonic Way to Avoid Default Parameters That Are Empty Lists
Plotting Dates on the X-Axis with Python's Matplotlib
How to Specify New Lines on Python, When Writing on Files