Count Number of Unique Characters in a String

How can I find the number of unique characters in a string?

This method has O(n^2) complexity, but it's very possible (though a bit more complex) to do this in O(n).

int CountUniqueCharacters(char* str){
int count = 0;

for (int i = 0; i < strlen(str); i++){
bool appears = false;
for (int j = 0; j < i; j++){
if (str[j] == str[i]){
appears = true;
break;
}
}

if (!appears){
count++;
}
}

return count;
}

The method iterates over all the characters in the string - for each character, it checks if the character appeared in any of the previous characters. If it didn't, then the character is unique, and the count is incremented.

Counting unique characters in a string

You can do System.out.println(countUniqueCharacters(s)); in the main method, to output the return value of your method. After a return, you cannot add more code. I did it for you and the output is 12, so it seems to be that there is also something wrong with your algorithm.

    int uniqeCharsCount = countUniqueCharacters(s);
System.out.println("The number of uniqe chars is " + uniqeCharsCount);

Output: 12

Your algorithm:

Actually you are checking every char, if this char is one more time in the string before. But you should also check if the char is anywhere in the string after the current index. You can fix it if you change your if condition to if (i != lowerCase.indexOf(characters[i]) || i != lowerCase.lastIndexOf(characters[i]))

Output of the fixed version: 3 (n, h, r)

Count number of unique characters in a string

There is no direct or easy way of doing it. You may need to write a store function to do the job and by looking at all the characters you may expect in the data. Here is an example for just digits , which could be extended for all the characters in a stored function

mysql> select * from test ;
+------------+
| val |
+------------+
| 11111111 |
| 111222222 |
| 1113333222 |
+------------+


select
val,
sum(case when locate('1',val) > 0 then 1 else 0 end )
+ sum( case when locate('2',val) > 0 then 1 else 0 end)
+ sum(case when locate('3',val) > 0 then 1 else 0 end)
+sum(case when locate('4',val) > 0 then 1 else 0 end ) as occurence
from test group by val


+------------+-----------+
| val | occurence |
+------------+-----------+
| 11111111 | 1 |
| 111222222 | 2 |
| 1113333222 | 3 |
+------------+-----------+

Or if you have enough time , create a lookup table with all the characters you could think of. And make the query in 2 lines

mysql> select * from test ;
+------------+
| val |
+------------+
| 11111111 |
| 111222222 |
| 1113333222 |
+------------+
3 rows in set (0.00 sec)

mysql> select * from look_up ;
+------+------+
| id | val |
+------+------+
| 1 | 1 |
| 2 | 2 |
| 3 | 3 |
| 4 | 4 |
+------+------+
4 rows in set (0.00 sec)

select
t1.val,
sum(case when locate(t2.val,t1.val) > 0 then 1 else 0 end ) as occ
from test t1,(select * from look_up)t2
group by t1.val ;

+------------+------+
| val | occ |
+------------+------+
| 11111111 | 1 |
| 111222222 | 2 |
| 1113333222 | 3 |
+------------+------+

Counting total number of unique characters for Python string

  1. Your function get_user_input() has no return value
  2. get_user_input.count('A') contains a function call, it will not work without parenthesis. Correct: get_user_input().count('A')
  3. As mentioned in the comments, get_user_input() should only be called once since the users input will be collected at each call via the input() function.

This should work:

    def count_bases():
char_list = get_user_input()
for char in 'ATCG':
char_count = char_list.count(char)
if char_count < 1:
print(char + " not found")
else:
print(char + " count: " + str(char_count))

def get_user_input():
one = input("Please enter DNA bases: ")
two=list(one)
return two

Counting unique characters in a String given by the user

It is extremely easy :)

public static int countUniqueCharacters(String input) {
boolean[] isItThere = new boolean[Character.MAX_VALUE];
for (int i = 0; i < input.length(); i++) {
isItThere[input.charAt(i)] = true;
}

int count = 0;
for (int i = 0; i < isItThere.length; i++) {
if (isItThere[i] == true){
count++;
}
}

return count;
}

Example for input "aab"

First for-cycle goes 3 times, each time for one char.

Value of "a" is 97, so it turns isItThere[97] to true, then second "a" is involved, which is doing the same, isItThere[97] is set to true again (hence changing nothing).

After that "b" is involved, value of char "b" is 98, therefore isItThere[98] is set to true.

And then you have second for-cycle, where you cycle through the all isItThere array. If you find any true statement, you increment count. In our case, you find isItThere[97] and isItThere[98] as true statement, it means you increment twice and returning 2.

Pandas number of unique/distinct characters in a string

No lambda:

>>> df['phone'].apply(set)
0 {0, 1, 2}
1 {1, 2}
2 {7, 1, 2}
3 {5, 6, 4, 2}
Name: phone, dtype: object

and

>>> df['phone'].apply(set).apply(len)
0 3
1 2
2 3
3 4
Name: phone, dtype: int64

Note: as correctly noted by @mozway, the double-apply is slower than a single apply with lambda, or than, even better, a dedicated function. But if you are looking to store both the set of unique digits and their length, then you would do one .apply(set) for the former and .apply(len) on it for the latter.

Timing

n = 1_000_000
df = pd.DataFrame({'phone': np.random.randint(1e4, 1e9, size=n).astype(int).astype(str)})

%timeit df['phone'].apply(set).apply(len)
# 1.17 s ± 1.76 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

%timeit df['phone'].apply(lambda x: len(set(x)))
# 738 ms ± 4.87 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

def nu(x):
return len(set(x))

%timeit df['phone'].apply(nu)
# 698 ms ± 1.2 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)


Related Topics



Leave a reply



Submit