Removing Duplicate Characters from a String

Removing duplicates from a String in Java

Convert the string to an array of char, and store it in a LinkedHashSet. That will preserve your ordering, and remove duplicates. Something like:

String string = "aabbccdefatafaz";

char[] chars = string.toCharArray();
Set<Character> charSet = new LinkedHashSet<Character>();
for (char c : chars) {
charSet.add(c);
}

StringBuilder sb = new StringBuilder();
for (Character character : charSet) {
sb.append(character);
}
System.out.println(sb.toString());

How to remove duplicate chars in a string?

It seems from your example that you want to remove REPEATED SEQUENCES of characters, not duplicate chars across the whole string. So this is what I'm solving here.

You can use a regular expression.. not sure how horribly inefficient it is but it
works.

>>> import re
>>> phrase = str("oo rarato roeroeu aa rouroupa dodo rerei dde romroma")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'o rato roeu a roupa do rei de roma'

How this substitution proceeds down the string:

oo -> o
" " -> " "
rara -> ra
to -> to
" "-> " "
roeroe -> roe

etc..

Edit: Works for the other example string which should not be modified:

>>> phrase = str("Barbara Bebe com Bernardo")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'Barbara Bebe com Bernardo'

remove duplicate character in string and make unique string

Do you like one liners? Minimal code can be very efficient. Compare the following:

With sets and list comprehension:

const remDup= e => [...new Set(e)].sort().join("");console.log(remDup("Rasikawef dfv dd"))

How to remove duplicate letters in string in PHP

I tend to avoid regex when possible. Here, I'd just split all the letters into one big array and then use array_unique() to de-duplicate:

$unique = array_unique(str_split(implode('', $elem)));

That gives you an array of the unique characters, one character per array element. If you'd prefer those as a string, just implode the array:

$unique = implode('', array_unique(str_split(implode('', $elem))));

Removing duplicates in each item in a list of strings

You can use set combined with list comprehension if you don't want the letter ordering to be preserved:

list1= ['AAB', 'CAA', 'ADA']
list1 = [''.join(set(l)) for l in list1]
print(list1)

Or use OrderedDict if you want the ordering to be preserved:

from collections import OrderedDict 
list1= ['AAB', 'CAA', 'ADA']
list1 = [''.join(OrderedDict.fromkeys(l).keys()) for l in list1]
print(list1)

Removing duplicate characters in a String array

Your problem is that you are checking each character against all the characters after it. Imagine an array with no duplicates; once you get to the last character, you have printed out all the characters before it. But when j = array.length - 1, then k = array.length, and the second for loop does not run at all, and your last character will never be printed.

Your code would always fail to print the last element correctly. The only case in which it would be correct is if your last element is a duplicate of a previous element, but not of the second-to-last element.

Try this code instead:

outerloop:
for (int j = 0; j < array.length; j++) {
for(int k = 0; k < j; k++) {
if(array[j] == array[k]) {
continue outerloop;
}
}
System.out.print(array[j] + " ");
}

The premise of the code is that it loops through each of the characters. If it matches any of the previous characters, the code skips that element and continues to the next one.

EDIT: Looks like you edited the question for a sorted array instead. That means if the last element is a duplicate, it will be a duplicate of the element before it so we don't have to worry about the corner case in my previous block of code.

for(int j=0; j< array.length; j++) {
for(int k=j+1; k< array.length; k++) {
if(array[j] == array[k]) {
continue;
}
System.out.print(array[j] + " ");
j = k;
}
}
System.out.print(array[array.length-1] + " ");

Removing specific duplicated characters from a string in Python

(Big edit: oops, I missed that you only want to de-deuplicate certain characters and not others. Retrofitting solutions...)

I assume you have a string that represents all the characters you want to de-duplicate. Let's call it to_remove, and say that it's equal to "_.-". So only underscores, periods, and hyphens will be de-duplicated.

You could use a regex to match multiple successive repeats of a character, and replace them with a single character.

>>> import re
>>> to_remove = "_.-"
>>> s = "Hello... _my name -- is __Alex"
>>> pattern = "(?P<char>[" + re.escape(to_remove) + "])(?P=char)+"
>>> re.sub(pattern, r"\1", s)
'Hello. _my name - is _Alex'

Quick breakdown:

  • ?P<char> assigns the symbolic name char to the first group.
  • we put to_remove inside the character matching set, []. It's necessary to call re.escape because hyphens and other characters may have special meaning inside the set otherwise.
  • (?P=char) refers back to the character matched by the named group "char".
  • The + matches one or more repetitions of that character.

So in aggregate, this means "match any character from to_remove that appears more than once in a row". The second argument to sub, r"\1", then replaces that match with the first group, which is only one character long.


Alternative approach: write a generator expression that takes only characters that don't match the character preceding them.

>>> "".join(s[i] for i in range(len(s)) if i == 0 or not (s[i-1] == s[i] and s[i] in to_remove))
'Hello. _my name - is _Alex'

Alternative approach #2: use groupby to identify consecutive identical character groups, then join the values together, using to_remove membership testing to decide how many values should be added..

>>> import itertools
>>> "".join(k if k in to_remove else "".join(v) for k,v in itertools.groupby(s, lambda c: c))
'Hello. _my name - is _Alex'

Alternative approach #3: call re.sub once for each member of to_remove. A bit expensive if to_remove contains a lot of characters.

>>> for c in to_remove:
... s = re.sub(rf"({re.escape(c)})\1+", r"\1", s)
...
>>> s
'Hello. _my name - is _Alex'

How to remove duplicated characters from a string?

You can use a generator expression and join like,

>>> x = 'Hello'
>>> ''.join(c for c in x if x.count(c) == 1)
'Heo'

How to remove duplicate characters from a string in Swift

Edit/update: Swift 4.2 or later

You can use a set to filter your duplicated characters:

let str = "bookkeeper"
var set = Set<Character>()
let squeezed = str.filter{ set.insert($0).inserted }

print(squeezed) // "bokepr"

Or as an extension on RangeReplaceableCollection which will also extend String and Substrings as well:

extension RangeReplaceableCollection where Element: Hashable {
var squeezed: Self {
var set = Set<Element>()
return filter{ set.insert($0).inserted }
}
}

let str = "bookkeeper"
print(str.squeezed) // "bokepr"
print(str[...].squeezed) // "bokepr"


Related Topics



Leave a reply



Submit