Match sequences of consecutive characters in a string
Using regex in Ruby 1.8.7+:
p s.scan(/((\d)\2*)/).map(&:first)
#=> ["111", "22", "1"]
This works because (\d)
captures any digit, and then \2*
captures zero-or-more of whatever that group (the second opening parenthesis) matched. The outer (…)
is needed to capture the entire match as a result in scan
. Finally, scan
alone returns:
[["111", "1"], ["22", "2"], ["1", "1"]]
…so we need to run through and keep just the first item in each array. In Ruby 1.8.6+ (which doesn't have Symbol#to_proc
for convenience):
p s.scan(/((\d)\2*)/).map{ |x| x.first }
#=> ["111", "22", "1"]
With no Regex, here's a fun one (matching any char) that works in Ruby 1.9.2:
p s.chars.chunk{|c|c}.map{ |n,a| a.join }
#=> ["111", "22", "1"]
Here's another version that should work even in Ruby 1.8.6:
p s.scan(/./).inject([]){|a,c| (a.last && a.last[0]==c[0] ? a.last : a)<<c; a }
# => ["111", "22", "1"]
How to match sequences of consecutive Money like characters string in Dart?
You can use
final text = '000000000012735';
print(text.replaceFirstMapped(RegExp(r'^0*(\d+)(\d{2})$'), (Match m) =>
"${m[1]}.${m[2]}"));
The output is 127.35
.
The regex matches
^
- start of string0*
- zero or more0
chars(\d+)
- Group 1: one or more digits(\d{2})
- Group 2: two digits$
- end of string.
Note that since one replacement is expected, there is no need using replaceAllMapped
, replaceFirstMapped
will do.
How to match sequences of consecutive Date like characters string in Dart?
You can use
text.replaceAllMapped(RegExp(r'\b(?:((?:19|20)\d{2})(0?[1-9]|1[0-2])(0?[1-9]|[12][0-9]|3[01])|(0?[1-9]|[12][0-9]|3[01])(0?[1-9]|1[0-2])((?:19|20)\d{2}))\b'), (Match m) => m[4] == null ? "${m[1]}.${m[2]}.${m[3]}" : "${m[4]}.${m[5]}.${m[6]}")
The \b(?:((?:19|20)\d{2})(0?[1-9]|1[0-2])(0?[1-9]|[12][0-9]|3[01])|(0?[1-9]|[12][0-9]|3[01])(0?[1-9]|1[0-2])((?:19|20)\d{2}))\b
regex matches
\b
- a word boundary(?:
- start of a non-capturing group:((?:19|20)\d{2})
- year from 20th and 21st centuries(0?[1-9]|1[0-2])
- month(0?[1-9]|[12][0-9]|3[01])
- day
|
- or(0?[1-9]|[12][0-9]|3[01])
- day(0?[1-9]|1[0-2])
- month((?:19|20)\d{2})
- year
)
- end of the group\b
- word boundary.
See the regex demo.
See a Dart demo:
void main() {
final text = '13022020 and 20200213 20111919';
print(text.replaceAllMapped(RegExp(r'\b(?:((?:19|20)\d{2})(0?[1-9]|1[0-2])(0?[1-9]|[12][0-9]|3[01])|(0?[1-9]|[12][0-9]|3[01])(0?[1-9]|1[0-2])((?:19|20)\d{2}))\b'), (Match m) =>
m[4] == null ? "${m[1]}.${m[2]}.${m[3]}" : "${m[4]}.${m[5]}.${m[6]}"));
}
Returning 13.02.2020 and 2020.02.13 20.11.1919
.
If Group 4 is null, the first alternative matched, so we need to use Group 1, 2 and 3. Else, we join Group 4, 5 and 6 with a dot.
Python regex to match 3 consecutive characters in the alphabet but not necessarily side by side
You ask if it is possible to do that with a regular expression? Certainly! Is it pretty? That's in the eye of the beholder.
You need a regular expression that looks like this (with the case-indifferent flag set):
^(?=.*\d)(?=.*[a-z])(?=.*[<special symbols here>])(?!<no 3 digits that are consecutive>)(?!<no three letters that are consecutive>).{8}
Let's look at the negative lookahead
(?!<no 3 digits that are consecutive>)
We can write that as follows.
(?!(?:(?=.*0)(?=.*1)(?=.*2))|(?:(?=.*1)(?=.*2)(?=.*3))|(?:(?=.*2)(?=.*3)(?=.*4))|(?:(?=.*3)(?=.*4)(?=.*5))|(?:(?=.*4)(?=.*5)(?=.*6))|(?:(?=.*5)(?=.*6)(?=.*7))|(?:(?=.*6)(?=.*7)(?=.*8))|(?:(?=.*7)(?=.*9)(?=.*9)))
Demo
The expression can be written in verbose mode (re.X
or re.VERBOSE
) to make it self-documenting.
(?! # begin negative lookahead
(?: # begin non-capture group
(?=.*0) # match > 0 characters followed by 0 (positive lookahead)
(?=.*1) # match > 0 characters followed by 1
(?=.*2) # match > 0 characters followed by 2
) # end non-capture group
| # or
... similar for (?:(?=.*1)(?=.*2)(?=.*3))
...
) # end negative lookahead
The construction of
(?!<no three letters that are consecutive>)
is similar (containing an alternation with 23 elements, a
, b
and c
to x
, y
and z
).
Match two consecutive sequences with fixed overall length
You can use the following method to cheat having to do alternations.
See regex in use here
\b[a-z]{1,4}\d{1,4}(?<=\b[a-z\d]{5})
\b
Assert position at a word boundary[a-z]{1,4}
Matches a lowercase letter between 1 and 4 times\d{1,4}
Matches a digit between 1 and 4 times(?<=\b[a-z\d]{5})
Positive lookbehind ensuring a combination of exactly 5 lowercase letters and digits precedes
Find consecutive characters in a string + their start and end indices (python)
One way using re.finditer
:
[(*m.span(), len(m.group(0))) for m in re.finditer("-+", s)]
Output:
[(1, 4, 3), (5, 6, 1), (10, 15, 5)]
Regex to match two or more consecutive characters
You can use a lookahead and a backreference to solve this. But note that right now you are requiring at least 2 characters. The starting letter and another one (due to the +
). You probably want to make that +
and *
so that the second character class can be repeated 0 or more times:
^(?!.*(.)\1)[a-zA-Z][a-zA-Z\d._-]*$
How does the lookahead work? Firstly, it's a negative lookahead. If the pattern inside finds a match, the lookahead causes the entire pattern to fail and vice-versa. So we can have a pattern inside that matches if we do have two consecutive characters. First, we look for an arbitrary position in the string (.*
), then we match single (arbitrary) character (.
) and capture it with the parentheses. Hence, that one character goes into capturing group 1
. And then we require this capturing group to be followed by itself (referencing it with \1
). So the inner pattern will try at every single position in the string (due to backtracking) whether there is a character that is followed by itself. If these two consecutive characters are found, the pattern will fail. If they cannot be found, the engine jumps back to where the lookahead started (the beginning of the string) and continue with matching the actual pattern.
Alternatively you can split this up into two separate checks. One for valid characters and the starting letter:
^[a-zA-Z][a-zA-Z\d._-]*$
And one for the consecutive characters (where you can invert the match result):
(.)\1
This would greatly increase the readability of your code (because it's less obscure than that lookahead) and it would also allow you to detect the actual problem in pattern and return an appropriate and helpful error message.
Using Regex to find longest consecutive match in a string
You should look for all matches of one or more consecutive occurrences of the string 'APPLE', which the following regex will do:
(?:APPLE)+
See RegEx demo
Then you sort them in descending order by length. Take the longest match (i.e., the first match) and divide by 5 (the number of characters in 'APPLE') and that will tell you how many consecutive occurrences of 'APPLE` were found in the longest match:
import re
s = "APPLEORANGEORANGEAPPLEAPPLEAPPLEBANANABANANABANANAAPPLEBANANA"
matches = sorted(re.findall(r'(?:APPLE)+', s), reverse=True)
if matches:
print(len(matches[0]) // 5)
else:
print(0)
Prints:
3
Related Topics
Why Does .All? Return True on an Empty Array
Differencebetween Ruby's 'Open-Uri' and 'Net:Http' Gems
Str.Each in Ruby Isn't Working
How to Use "_Blank" or "_New" in Rails
How to Ignore a Folder in Zeitwerk for Rails 6
Ruby Pipes: How to Tie the Output of Two Subprocesses Together
Why Doesn't Relative_Require Work on Ruby 1.8.6
Undefined Method 'Name' for "Actionmailer":String
How to Find Gems That Depend on a Given Gem
How to Store an Instance Variable Across Multiple Actions in a Controller
Slicing Params Hash for Specific Values
In Ruby How to Use Class Level Local Variable? (A Ruby Newbie's Question)
How to Reset a Factory_Girl Sequence
Error Install Rubyracer with Error "Invalid Gem: Package Is Corrupt"