Count Consecutive Characters

Count consecutive characters

A solution "that way", with only basic statements:

word="100011010" #word = "1"
count=1
length=""
if len(word)>1:
for i in range(1,len(word)):
if word[i-1]==word[i]:
count+=1
else :
length += word[i-1]+" repeats "+str(count)+", "
count=1
length += ("and "+word[i]+" repeats "+str(count))
else:
i=0
length += ("and "+word[i]+" repeats "+str(count))
print (length)

Output :

'1 repeats 1, 0 repeats 3, 1 repeats 2, 0 repeats 1, 1 repeats 1, and 0 repeats 1'
#'1 repeats 1'

Find count of each consecutive characters

Regular expression to the rescue ?

var myString = "aaaabbccaa";

var pattern = @"(\w)\1*";
var regExp = new Regex(pattern);
var matches = regExp.Matches(myString);

var tab = matches.Select(x => String.Format("{0}{1}", x.Value.First(), x.Value.Length));
var result = String.Join("", tab);

How to count consecutive characters?

Solution if you are interested in the while loop mechanics :

l = 'aaaabbBBccaazzZZZzzzertTTyyzaaaAA'
output = ''

index = 0
while index < len(l):
incr = index
count = 1
output += l[incr]
while incr < len(l)-1 and l[incr]==l[incr+1]:
count += 1
incr += 1
index += 1
output += str(count)
index += 1

print(output)

How to count instances of consecutive letters in a string in Python 3?

This is possible with itertools.groupby:

from itertools import groupby

x = 'EOOOEOEE'

res = sum(len(list(j)) > 1 for i, j in groupby(x) if i == 'O') # 1

How can I quickly count the maximum number of consecutive single characters in a string?

Remarkable speed improvements can be made with a dynamic regex. We can use a variable to store the max length string, then search for a string that is that long, plus one or more. The theory being that we only need to look for strings longer than the one we already have.

I used a solution that looks like this

sub hack {
my $match = ""; # original search string
while ($string =~ /(${match}1+)/g) { # search for $match plus 1 or more 1s
$match = $1; # when found, change to new match
}
length $match; # return max length
}

And compared it to the original method described by the OP, with the following result

use strict;
use warnings;
use Benchmark ':all';

my $string = '0100100101111011010010101101101110101011111111101010100100100001011101010100' x 10_000;

cmpthese(-1, {
org => sub { my $max = 0; while ($string =~ /(1+)/g) { my $len = length($1); if ($max < $len) { $max = $len } } },
hack => sub { my $match = ""; while ($string =~ /(${match}1+)/g) { $match = $1; } length $match }
});

Output:

       Rate    org   hack
org 7.31/s -- -99%
hack 1372/s 18669% --

Which seems astonishingly high, 19000% faster. It makes me think I've made a mistake, but I can't think what that would be. Maybe I am missing something in the regex machine internals, but this would be quite the improvement on the original solution.

Code that takes a string and recognizes the number of consecutive letters

Here is one way. You only need a single loop. The inner loop does the work. The outer loop simply supplies test cases.

  • assign the first character
  • and set count to 1 for that character
  • then iterate until adjacent characters are different
  • append count if > 1 and append the different character
  • set count to 0 for next run.
String[] data = { "uuuuuuhhhaaajqqq", 
"hhhttrew","abbcccddddeeeeeffffffggggggg" };

for (String s : data) {
String result = "" + s.charAt(0);
int count = 1;
for (int i = 1; i < s.length(); i++) {
if (s.charAt(i - 1) != s.charAt(i)) {
result += count <= 1 ? "" : count;
result += s.charAt(i);
count = 0;
}
count++;
if (i == s.length() - 1) {
result += count <= 1 ? "" : count;
}
}
System.out.printf("%-15s <-- %s%n", result, s);
}

prints

u6h3a3jq3       <-- uuuuuuhhhaaajqqq
h3t2rew <-- hhhttrew
ab2c3d4e5f6g7 <-- abbcccddddeeeeeffffffggggggg

In a comment (now deleted) you had enquired how to reverse the process. This is one way to do it.

  • allocate a StringBuilder to hold the result.
  • initialize count and currentChar
  • as the string is processed,
    • save a character to currentChar
    • then while the next char(s) is a digit, build the count
  • if the count is still 0, then the next character was a digit so bump count by one and copy the currentChar to the buffer
  • otherwise, use the computed length.
String[] encoded =
{ "u6h3a3jq3", "h3t2rew", "ab2c3d4e5f6g7" };

for (String s : encoded) {

StringBuilder sb = new StringBuilder();
int count = 0;
char currentChar = '\0';
for (int i = 0; i < s.length();) {
if (Character.isLetter(s.charAt(i))) {
currentChar = s.charAt(i++);
}
while (i < s.length()
&& Character.isDigit(s.charAt(i))) {
count = count * 10 + s.charAt(i++) - '0';
}
count = count == 0 ? 1 : count;
sb.append(Character.toString(currentChar)
.repeat(count));
count = 0;
}
System.out.println(s + " --> " + sb);
}

prints

u6h3a3jq3 --> uuuuuuhhhaaajqqq
h3t2rew --> hhhttrew
ab2c3d4e5f6g7 --> abbcccddddeeeeeffffffggggggg

Python: Count the consecutive characters at the beginning of a string

If you strip the characters from the beginning, then you are left with a shorter string and can subtract its length from the original, giving you the number of characters removed.

return len(s) - len(s.lstrip(target))


Note: Your shown code will immediately return 0 if the first character does not match target. If you want to check if there is any repeated first character, you don't need to have target and can just use s[0]



Related Topics



Leave a reply



Submit