Removing Duplicates from a String in Java

Removing duplicates from a String in Java

Convert the string to an array of char, and store it in a LinkedHashSet. That will preserve your ordering, and remove duplicates. Something like:

String string = "aabbccdefatafaz";

char[] chars = string.toCharArray();
Set<Character> charSet = new LinkedHashSet<Character>();
for (char c : chars) {
charSet.add(c);
}

StringBuilder sb = new StringBuilder();
for (Character character : charSet) {
sb.append(character);
}
System.out.println(sb.toString());

how to remove duplicates in a string, for example: my name is this and that this and that - the output would be my name is this and that

This seems to be the easier way:

    List<String> arr = Arrays.asList(source.split("\\s"));
Set<String> distincts = new LinkedHashSet<>(arr);
String result String.join(" ", distincts);

Rewriting above using Java 8 streams

    public void duplicateRemover() {
String source = "my name is this and that this and that";
List<String> distincts = Arrays.stream(source.split("\\s")).distinct().collect(Collectors.toList());
String result = String.join(" ", distincts);
System.out.println(result);
}

How to remove duplicate values in string which has delimiters

You could use a LinkedHashSet to preserve insertion order. Once you splitted the String by "||" just add the delimiters when constructing back the String.

 String s = "||HelpDesk||IT Staff||IT Staff||Admin||Audit||HelpDesk||";
Set<String> set = new LinkedHashSet<>(Arrays.asList(s.split(Pattern.quote("||"))));
String noDup = "||";
for(String st : set) {
if(st.isEmpty()) continue;
noDup += st+"||";
}

Or using the new java 8 Stream API :

 String noDup = "||"+
Arrays.stream(s.split(Pattern.quote("||")))
.distinct()
.filter(st -> !st.isEmpty()) //we need to remove the empty String produced by the split
.collect(Collectors.joining("||"))+"||";

Both approaches yield the same result (||HelpDesk||IT Staff||Admin||Audit||).

Remove duplicates from a list of String Array

You can use the toMap collector to provide a custom keyMapper function which serves as a uniqueness test, then simply use the values of the map as your result.

For your uniqueness test, I think it makes more sense to use index 1 (the userID) instead of index 0 (the userName). However, if you wish to change it back, use arr[0] instead of arr[1] below:

List<String[]> userList = new ArrayList<>();
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","123"});
userList.add(new String[]{"George","456"});
List<String[]> userListNoDupes = new ArrayList<>(userList.stream()
.collect(Collectors.toMap(arr-> arr[1], Function.identity(), (a,b)-> a)).values());
for(String[] user: userListNoDupes) {
System.out.println(Arrays.toString(user));
}

Output:

[George, 123]

[George, 456]

Removing duplicates words from a string

you can use regex to do this for you. sample code:

String regex = "\\b(\\w+)\\b\\s*(?=.*\\b\\1\\b)";
input = input.replaceAll(regex,"");
  1. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  2. \w Matches any word character (alphanumeric & underscore).
  3. \b Matches a word boundary position between a word character and non-word character or position (start / end of string).
  4. \s Matches any whitespace character (spaces, tabs, line breaks).
  5. * Match 0 or more of the preceding token.
  6. (?= Matches a group after the main expression without including it in the result.
  7. . Matches any character except line breaks.
  8. \1 Matches the results of capture group #1 in step 2.

Note: It is important to use word boundaries here to avoid matching partial words.

Here's a link to regex demo and explaination : RegexDemo

Remove duplicate values from a string in java

This does it in one line:

public String deDup(String s) {
return new LinkedHashSet<String>(Arrays.asList(s.split("-"))).toString().replaceAll("(^\\[|\\]$)", "").replace(", ", "-");
}

public static void main(String[] args) {
System.out.println(deDup("Bangalore-Chennai-NewYork-Bangalore-Chennai"));
}

Output:

Bangalore-Chennai-NewYork

Notice that the order is preserved :)

Key points are:

  • split("-") gives us the different values as an array
  • Arrays.asList() turns the array into a List
  • LinkedHashSet preserves uniqueness and insertion order - it does all the work of giving us the unique values, which are passed via the constructor
  • the toString() of a List is [element1, element2, ...]
  • the final replace commands remove the "punctuation" from the toString()

This solution requires the values to not contain the character sequence ", " - a reasonable requirement for such terse code.

Java 8 Update!

Of course it's 1 line:

public String deDup(String s) {
return Arrays.stream(s.split("-")).distinct().collect(Collectors.joining("-"));
}

Regex update!

If you don't care about preserving order (ie it's OK to delete the first occurrence of a duplicate):

public String deDup(String s) {
return s.replaceAll("(\\b\\w+\\b)-(?=.*\\b\\1\\b)", "");
}

Java 8 remove duplicate strings irrespective of case from a list

Taking your question literally, to “remove duplicate strings irrespective of case from a list”, you may use

// just for constructing a sample list
String str = "Kobe Is is The the best player In in Basketball basketball game .";
List<String> list = new ArrayList<>(Arrays.asList(str.split("\\s")));

// the actual operation
TreeSet<String> seen = new TreeSet<>(String.CASE_INSENSITIVE_ORDER);
list.removeIf(s -> !seen.add(s));

// just for debugging
System.out.println(String.join(" ", list));


Related Topics



Leave a reply



Submit