Java Array, Finding Duplicates

Java Array, Finding Duplicates

On the nose answer..

duplicates=false;
for (j=0;j<zipcodeList.length;j++)
for (k=j+1;k<zipcodeList.length;k++)
if (k!=j && zipcodeList[k] == zipcodeList[j])
duplicates=true;

Edited to switch .equals() back to == since I read somewhere you're using int, which wasn't clear in the initial question. Also to set k=j+1, to halve execution time, but it's still O(n2).

A faster (in the limit) way

Here's a hash based approach. You gotta pay for the autoboxing, but it's O(n) instead of O(n2). An enterprising soul would go find a primitive int-based hash set (Apache or Google Collections has such a thing, methinks.)

boolean duplicates(final int[] zipcodelist)
{
Set<Integer> lump = new HashSet<Integer>();
for (int i : zipcodelist)
{
if (lump.contains(i)) return true;
lump.add(i);
}
return false;
}

Bow to HuyLe

See HuyLe's answer for a more or less O(n) solution, which I think needs a couple of add'l steps:

static boolean duplicates(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP+1];
java.util.Arrays.fill(bitmap, false);
for (int item : zipcodeList)
if (!bitmap[item]) bitmap[item] = true;
else return true;
}
return false;
}

Or Just to be Compact

static boolean duplicates(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP+1]; // Java guarantees init to false
for (int item : zipcodeList)
if (!(bitmap[item] ^= true)) return true;
return false;
}

Does it Matter?

Well, so I ran a little benchmark, which is iffy all over the place, but here's the code:

import java.util.BitSet;

class Yuk
{
static boolean duplicatesZero(final int[] zipcodelist)
{
boolean duplicates=false;
for (int j=0;j<zipcodelist.length;j++)
for (int k=j+1;k<zipcodelist.length;k++)
if (k!=j && zipcodelist[k] == zipcodelist[j])
duplicates=true;

return duplicates;
}

static boolean duplicatesOne(final int[] zipcodelist)
{
final int MAXZIP = 99999;
boolean[] bitmap = new boolean[MAXZIP + 1];
java.util.Arrays.fill(bitmap, false);
for (int item : zipcodelist) {
if (!(bitmap[item] ^= true))
return true;
}
return false;
}

static boolean duplicatesTwo(final int[] zipcodelist)
{
final int MAXZIP = 99999;

BitSet b = new BitSet(MAXZIP + 1);
b.set(0, MAXZIP, false);
for (int item : zipcodelist) {
if (!b.get(item)) {
b.set(item, true);
} else
return true;
}
return false;
}

enum ApproachT { NSQUARED, HASHSET, BITSET};

/**
* @param args
*/
public static void main(String[] args)
{
ApproachT approach = ApproachT.BITSET;

final int REPS = 100;
final int MAXZIP = 99999;

int[] sizes = new int[] { 10, 1000, 10000, 100000, 1000000 };
long[][] times = new long[sizes.length][REPS];

boolean tossme = false;

for (int sizei = 0; sizei < sizes.length; sizei++) {
System.err.println("Trial for zipcodelist size= "+sizes[sizei]);
for (int rep = 0; rep < REPS; rep++) {
int[] zipcodelist = new int[sizes[sizei]];
for (int i = 0; i < zipcodelist.length; i++) {
zipcodelist[i] = (int) (Math.random() * (MAXZIP + 1));
}
long begin = System.currentTimeMillis();
switch (approach) {
case NSQUARED :
tossme ^= (duplicatesZero(zipcodelist));
break;
case HASHSET :
tossme ^= (duplicatesOne(zipcodelist));
break;
case BITSET :
tossme ^= (duplicatesTwo(zipcodelist));
break;

}
long end = System.currentTimeMillis();
times[sizei][rep] = end - begin;

}
long avg = 0;
for (int rep = 0; rep < REPS; rep++) {
avg += times[sizei][rep];
}
System.err.println("Size=" + sizes[sizei] + ", avg time = "
+ avg / (double)REPS + "ms");
}
}

}

With NSQUARED:

Trial for size= 10
Size=10, avg time = 0.0ms
Trial for size= 1000
Size=1000, avg time = 0.0ms
Trial for size= 10000
Size=10000, avg time = 100.0ms
Trial for size= 100000
Size=100000, avg time = 9923.3ms

With HashSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.16ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.15ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.16ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

With BitSet

Trial for zipcodelist size= 10
Size=10, avg time = 0.0ms
Trial for zipcodelist size= 1000
Size=1000, avg time = 0.0ms
Trial for zipcodelist size= 10000
Size=10000, avg time = 0.0ms
Trial for zipcodelist size= 100000
Size=100000, avg time = 0.0ms
Trial for zipcodelist size= 1000000
Size=1000000, avg time = 0.0ms

BITSET Wins!

But only by a hair... .15ms is within the error for currentTimeMillis(), and there are some gaping holes in my benchmark. Note that for any list longer than 100000, you can simply return true because there will be a duplicate. In fact, if the list is anything like random, you can return true WHP for a much shorter list. What's the moral? In the limit, the most efficient implementation is:

 return true;

And you won't be wrong very often.

How to find duplicate elements in array using for each loop

Try using the below logic which compares every element with all other element in the array, if any duplicate is found,it stops the execution to continue futher

for(int i = 0; i < a.length;i++) {
for (int j = i + 1 ; j < a.length; j++) {
if (a[i] == a[j]) {
System.out.println("Found duplicate");
return;
}
}
}
System.out.println("No duplicate Found");

How to find the duplicates in an array in O(1) time in Java?

It is mathematically impossible to find duplicates in O(1). You have to examine all N elements of the array at least once to test if each one is a duplicate. That is at least N operations, so the lower bound on the complexity is O(N).

Hint: you can do it in O(N) if you use (say) a HashSet to record each value that you have already seen. The snag is that a HashSet is a space-hungry data structure.


Please provide suggestions/alternate methods to sort an array, I have encountered this problem many times.

The simply way to sort an array of integers is to use Arrays::sort(int[]). That will be O(NlogN).

It is theoretically possible to sort an integer array in better than O(NlogN), but only if you can place a bound on the range of the integer. Lookup up counting sort. The complexity is O(max(N, R) where R is the difference between the smallest and largest numbers. The catch is that O(R) could be much larger than O(N) ... depending on the inputs.

But if you know that M is likely to be less than NlogN, you can use a variant of count sorting and use only O(M) bits of extra space to de-duplicate the array in O(max(M, N)). (I will leave you to figure out the details.)

How to efficiently remove duplicates from an array without using Set

Since this question is still getting a lot of attention, I decided to answer it by copying this answer from Code Review.SE:

You're following the same philosophy as the bubble sort, which is
very, very, very slow. Have you tried this?:

  • Sort your unordered array with quicksort. Quicksort is much faster
    than bubble sort (I know, you are not sorting, but the algorithm you
    follow is almost the same as bubble sort to traverse the array).

  • Then start removing duplicates (repeated values will be next to each
    other). In a for loop you could have two indices: source and
    destination. (On each loop you copy source to destination unless they
    are the same, and increment both by 1). Every time you find a
    duplicate you increment source (and don't perform the copy).
    @morgano

Problem finding duplicate numbers in array, is my technique bad?

Here's the way I would do it: (comments for educational purposes, would probably not have them in production code.)

public String duplicate(int[] numbers) {

// holds the items we've encountered more than once.
// TreeSet<> keeps things in sorted order for us.
final SortedSet<Integer> duplicates = new TreeSet<>();

// keeps track of items we've encountered.
final Set<Integer> encountered = new HashSet<>();

// iterate over every number
for (final int number : numbers) {
// Add the item to encountered. Set.add() will return true if
// the element is new to the set.
if (!encountered.add(number)) {
// Since the element wasn't new, ensure this item exists in the duplicates collection.
duplicates.add(number);
}
}

return duplicates.toString();
}

How to check the duplicate object in the array

You should address then following things in your design/code:

  1. Since you are creating a player using createPlayer(String userName, String familyName, String givenName), you should make the constructor, NimPlayer(String userName,String surName, String givenName) private so that it can not be called from outside of the class, NimPlayer. Also, declare createPlayer as static so that it doesn't need a NimPlayer object to be called on.
  2. You need to have a static counter to keep track of the number of players and check the value of this counter before adding a new player to playerList.
  3. You should also check the size of the resulting array after inputName.split(","). Similarly, you should check the size of the returned array from splitName before you access any element from it.

Given below is the code incorporating the points mentioned above:

import java.util.Scanner;

class NimPlayer {
private String userName;
private String familyName;
private String givenName;
//...
// public getters and setters of userName, familyName, and givenName
//...
private static int counter = 0;
private static NimPlayer[] playerList = new NimPlayer[10];

private NimPlayer(String userName, String familyName, String givenName) {
this.userName = userName;
this.familyName = familyName;
this.givenName = givenName;
}

public static void createPlayer(String userName, String familyName, String givenName) {
if (counter < 10) {
playerList[counter++] = new NimPlayer(userName, familyName, givenName);
} else {
System.out.println("The list is full.");
}
}

public static int getCounter() {
return counter;
}

public static NimPlayer[] getPlayers() {
return playerList;
}
}

public class Main {
public static void main(String[] args) {
Scanner in = new Scanner(System.in);
while (true) {
System.out.print('$');
String commandInput = in.next();
if (commandInput.equals("addplayer")) {
String inputName = in.nextLine();
String[] name = splitName(inputName);
if (name != null && name.length == 3) {
NimPlayer.createPlayer(name[0], name[1], name[2]);
}
} else {
break;
}
}
for (int i = 0; i < NimPlayer.getCounter(); i++) {
System.out.println(NimPlayer.getPlayers()[i].getUserName());
}
}

public static String[] splitName(String inputName) {
String[] splittedLine = inputName.split(",");
String[] name = null;
if (splittedLine.length == 3) {
String userName = splittedLine[0].trim();
String familyName = splittedLine[1].trim();
String givenName = splittedLine[2].trim();
name = new String[3];
name[0] = userName;
name[1] = familyName;
name[2] = givenName;
}
return name;
}
}

I didn't understand your another question:

For checking if there is the duplicate "userName" in the set of the
"inputName", I loop through the objects in an array. How can I access
the "userName" in this situation?

Find duplicate values in an array in java

You are in the right way, I have just updated your method, I hope that you will understand what was your mistake:

public void findDupicateInArray(int[] a) {
int count=0;
for(int j=0;j<a.length;j++) {
for(int k =j+1;k<a.length;k++) {
if(a[j]==a[k]) {
count++;
}
}
if(count==1)
System.out.println(a[j]);
count = 0;
}
}

Nevertheless, this will make your code running correctly, and that does not mean you have written the optimal code.

Find duplicate element occur more than two times from an array in java

You want to iterate through your array only one time. If all you want is duplicates, you can do this simply by keeping track of any value you have seen before using an ArrayList:

int[] data = {5, 6, 1, 6, 9, 5, 2, 1, 5};

System.out.println(Arrays.toString(data));

ArrayList<Integer> seenBeforeList = new ArrayList<>();
for(int index = 0; index < data.length; index++){
int value = data[index];
if(seenBeforeList.contains(value)){
System.out.println("Duplicate Element : " + value);
System.out.println("Index of that duplicate element : " + index);
} else {
seenBeforeList.add(value);
}
}

Output:

[5, 6, 1, 6, 9, 5, 2, 1, 5]
Duplicate Element : 6
Index of that duplicate element : 3
Duplicate Element : 5
Index of that duplicate element : 5
Duplicate Element : 1
Index of that duplicate element : 7
Duplicate Element : 5
Index of that duplicate element : 8

If you want to group by value, then it would make more sense to use a HashMap, storing the value as the key, and the indexes as the value. Then simply iterate through the HashMap.



Related Topics



Leave a reply



Submit