Remove repeating character
Use backrefrences
echo preg_replace("/(.)\\1+/", "$1", "cakkke");
Output:
cake
Explanation:
(.)
captures any character
\\1
is a backreferences to the first capture group. The .
above in this case.
+
makes the backreference match atleast 1 (so that it matches aa, aaa, aaaa, but not a)
Replacing it with $1
replaces the complete matched text kkk
in this case, with the first capture group, k
in this case.
How to remove repeating letter in a dataframe?
You may try this:
df["Col"] = df["Col"].str.replace(u"h{4,}", "")
Where you may set the number of characters to match in my case 4.
Col
0 hello, I'm today hh hhhh hhhhhhhhhhhhhhh
1 Hello World
Col
0 hello, I'm today hh
1 Hello World
I used unicode matching, since you mentioned you are in tweets.
How to remove repeated\same characters in a sequence from a string using C#?
According to my understanding, you want to eliminate duplicates only if it is in a consecutive sequence. You could achieve it using the following
Using List<string>
var nonDuplicates = new List<char>();
foreach (var element in str.ToCharArray())
{
if(nonDuplicates.Count == 0 || nonDuplicates.Last() != element)
nonDuplicates.Add(element);
}
var result = new string(nonDuplicates.ToArray());
Update
With reference to comment from , I have updated and the answer with two more solutions and ran the benchmark on them. The results are shown below.
Using String Append
var str = "aaaabbcccghbcccciippppkkllk";
var strResult = string.Empty;
foreach (var element in str.ToCharArray())
{
if (strResult.Length == 0 || strResult[strResult.Length - 1] != element)
strResult = $"{strResult}{element}";
}
Using StringBuilder
var str = "aaaabbcccghbcccciippppkkllk";
var strResult = new StringBuilder();
foreach (var element in str.ToCharArray())
{
if (strResult.Length == 0 || strResult[strResult.Length - 1] != element)
strResult.Append(element);
}
var result = strResult.ToString();
Benchmark Results
Method | Mean | Error | StdDev | Median |
------------------- |-----------:|----------:|-----------:|-----------:|
UsingList | 809.7 ns | 11.975 ns | 11.202 ns | 806.5 ns |
UsingStringAppend | 1,738.0 ns | 39.269 ns | 109.467 ns | 1,697.2 ns |
UsingStringBuilder | 201.6 ns | 1.960 ns | 1.834 ns | 201.1 ns |
As seen in the results, the StrinbBuilder Approach is much fast when compared to List. The string append approach is slowest.
Input
aaaabbcccghbcccciippppkkllk
Output
abcghbcipklk
How can I remove repeated characters in a string with R?
I did not think very carefully on this, but this is my quick solution using references in regular expressions:
gsub('([[:alpha:]])\\1+', '\\1', 'BuenRemove Repeating Charactera Suerrrrte')
# [1] "Buena Suerte"
()
captures a letter first, \\1
refers to that letter, +
means to match it once or more; put all these pieces together, we can match a letter two or more times.
To include other characters besides alphanumerics, replace [[:alpha:]]
with a regex matching whatever you wish to include.
Remove repeating characters from sentence but retain the words meaning
You can combine regex and NLP here by iterating over all words in a string, and once you find one with identical consecutive letters reduce them to max 2 consecutive occurrences of the same letters and run the automatic spellcheck to fix the spelling.
See an example Python code:
import re
from textblob import TextBlob
from textblob import Word
rx = re.compile(r'([^\W\d_])\1{2,}')
print( re.sub(r'[^\W\d_]+', lambda x: Word(rx.sub(r'\1\1', x.group())).correct() if rx.search(x.group()) else x.group(), tweet) )
# => "I'm so happy about offline school"
The code uses the Textblob
library, but you may use any you like.
Note that ([^\W\d_])\1{2,}
matches any three or more consecutive letters, [^\W\d_]+
matches one or more letters.
How to remove duplicate chars in a string?
It seems from your example that you want to remove REPEATED SEQUENCES of characters, not duplicate chars across the whole string. So this is what I'm solving here.
You can use a regular expression.. not sure how horribly inefficient it is but it
works.
>>> import re
>>> phrase = str("oo rarato roeroeu aa rouroupa dodo rerei dde romroma")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'o rato roeu a roupa do rei de roma'
How this substitution proceeds down the string:
oo -> o
" " -> " "
rara -> ra
to -> to
" "-> " "
roeroe -> roe
etc..
Edit: Works for the other example string which should not be modified:
>>> phrase = str("Barbara Bebe com Bernardo")
>>> re.sub(r'(.+?)\1+', r'\1', phrase)
'Barbara Bebe com Bernardo'
Regex remove repeated characters from a string by javascript
A lookahead like "this, followed by something and this":
var str = "aaabbbccccabbbbcccccc";console.log(str.replace(/(.)(?=.*\1)/g, "")); // "abc"
Remove characters which repeat more than twice in a string
Try using sub
, with the pattern (.)\\1{2,}
:
F <- ("hhhappy birthhhhhhdayyy")
gsub("(.)\\1{2,}", "\\1", F)
[1] "happy birthday"
Explanation of regex:
(.) match and capture any single character
\\1{2,} then match the same character two or more times
We replace with just the single matching character. The quantity \\1
represents the first capture group in sub
.
How can we remove word with repeated single character?
A better approach here is to use a set
def modify(s):
#Create a set from the string
c = set(s)
#If you have only one character in the set, convert set to string
if len(c) == 1:
return ''.join(c)
#Else return original string
else:
return s
print(modify('good'))
print(modify('gggggggg'))
If you want to use regex, mark the start and end of the string in our regex by ^
and $
(inspired from @bobblebubble comment)
import re
def modify(s):
#Create the sub string with a regex which only matches if a single character is repeated
#Marking the start and end of string as well
out = re.sub(r'^([a-z])\1+$', r'\1', s)
return out
print(modify('good'))
print(modify('gggggggg'))
The output will be
good
g
Related Topics
Laravel Add a New Column to Existing Table in a Migration
Programmatically Access Currency Exchange Rates
PHP Fatal Error: Call to Undefined Function MySQLi_Stmt_Get_Result()
Set Session Variable Using JavaScript in PHP
How to Force Users to Access My Page Over Https Instead of Http
How to Access a Deep Object Property Named as a Variable (Dot Notation) in PHP
Is There a Good PHP Geolocation Service
How to Get Int Instead String from Form
Get Numbers from String with PHP
Pg_Query Result Contains Strings Instead of Integer, Numeric
Http Options Request on Azure Websites Fails Due to Cors
Strange Behaviour with Numbers That Have a Leading Zero
Best Methods to Clean Up a Hacked Site with No Clean Version Available
Warning: Imagejpeg() [Function:Imagejpeg]: Gd-Jpeg: Jpeg Library Reports Unrecoverable Error