Matching two overlapping patterns with Perl
The following uses a zero-width assertion (I believe that's what it's called).
#!/usr/bin/perl
use strict;
use warnings;
$_ = "betalphabetabeta";
while (/(?=(alpha|beta))/g) {
print $1, "\n";
Prints:
C:\Old_Data\perlp>perl t9.pl
beta
alpha
beta
beta
How do I count regex matches in perl when using multiple possible match targets separated by |?
The problem is that the trailing ,
is consumed in the ,9,
match, so when it starts looking for the next match it starts at 11,12,
. There's no leading ,
before the 11,
so it can't match that. I'd recommend using a lookahead like this:
,(4|9|11)(?=,)
This way, the trailing ,
will not be consumed as part of the match.
For example:
my $string = ",4,8,9,11,12,";
my $test = ",(4|9|11)(?=,)";
my @c = $string =~ m/$test/g;
my $count = @c;
print "count: $count\n";
print "\@c:", join(" ", @c), "\n";
Outputs:
count: 3
@c:4 9 11
How to count matches for a named capture group in Perl
EDIT after question update
-0777
means the whole file is read once (input record separator undef)-i
: edit file inplace (like sed -i), must be removed to avoid to modify file-p
: prints lines
following command should just print the number of matches
perl -0777 -ne '$cnt=@a=m{('$PASSTHROUGH'(*SKIP)(?!)|'$REPLACE')}pg;print "$cnt\n"'
it is done differently :
- the principle of pattern alternation is to match first what should fail to keep what we want
(*SKIP)
: is a backtracking control verb which prevent regex engine to backtrack after match fail, that's what is done normally(?!)
: is the same as(*FAIL)
Perl - Extract all regular expression match
This is almost the same question as Count overlapping regex matches in Perl OR Ruby.
This code is nearly unchanged from perldoc perlre, under the section titled "Special Backtracking Control Verbs":
use strict;
use warnings;
my $regex = qr/M?[VI]?A?R?G?D?[LM]?G?[IVMAL]?E?/;
my $text = 'VMVARGDLGVE';
my $count = 0;
$text =~ /$regex(?{print "$&\n"; $count++})(*FAIL)/g;
print "Got $count matches\n";
The script does count empty string matches to come up with a count of 97 matches.
Why is Perl lazy when regex matching with * against a group?
This isn't a matter of greedy or lazy repetition. (?:fj)*
is greedily matching as many repetitions of "fj" as it can, but it will successfully match zero repetitions. When you try to match it against the string "f fjfj ff"
, it will first attempt to match at position zero (before the first "f"). The maximum number of times you can successfully match "fj" at position zero is zero, so the pattern successfully matches the empty string. Since the pattern successfully matched at position zero, we're done, and the engine has no reason to try a match at a later position.
The moral of the story is: don't write a pattern that can match nothing, unless you want it to match nothing.
Overlapping matches in R
The standard regmatches
does not work well with captured matches (specifically multiple captured matches in the same string). And in this case, since you're "matching" a look ahead (ignoring the capture), the match itself is zero-length. There is also a regmatches()<-
function that may illustrate this. Obseerve
x <- 'ACCACCACCAC'
m <- gregexpr('(?=([AC]C))', x, perl=T)
regmatches(x, m) <- "~"
x
# [1] "~A~CC~A~CC~A~CC~AC"
Notice how all the letters are preserved, we've just replaced the locations of the zero-length matches with something we can observe.
I've created a regcapturedmatches() function that I often use for such tasks. For example
x <- 'ACCACCACCAC'
regcapturedmatches(x, gregexpr('(?=([AC]C))', x, perl=T))[[1]]
# [,1] [,2] [,3] [,4] [,5] [,6] [,7]
# [1,] "AC" "CC" "AC" "CC" "AC" "CC" "AC"
The gregexpr
is grabbing all the data just fine so you can extract it from that object anyway you life if you prefer not to use this helper function.
ruby regexp Skipping Zero Length Matches and nil matches
Here's a regex that captures in group #1 everything after postcode/
if it's present, or else everything after .co.uk/
:
\.co\.uk\/(?:postcode\/)?([^\/\n]+(?:\/[^\/\n]+)?)
(DEMO)
Note that this will give unexpected results if there are unwanted path elements at the end of a postcode link, such as:
http://www.adresses.co.uk/postcode/rm107jj/oops
UPDATE: Based on the comments, it looks like you want to match just the last path element. But we can't simply capture the second element, because there might be only one:
http://www.adresses.co.uk/west-midlands
We can, however, make the first element optional:
\.co\.uk\/(?:[^\/\n]+\/)?([^\/\n]+)
Notice how I used a non-capturing group for the optional portion, so the part you want is still captured in group #1.
...
Related Topics
How to Sort a Ruby Array by Two Conditions
How to Make an Infowindow Automatically Display as Open with Google-Maps-For-Rails
Why Won't Ternary Operator Work with Redirect
How to Require Activerecord in Irb
Ruby Was-Sdk V2:Seahorse::Client::Networkingerror Exception: Ssl_Connect
How to Get Order Username and Provisiondate for All Softlayer MAChines Using Ruby
How to Schedule a Function to Execute at a Future Time
Iterate and Set Ruby Object Instance Variables
How to Check If a Specific Key Is Present in a Hash or Not
Gracefully Shutting Down Sidekiq Processes
How to Verify Pacts Against an API That Requires an Auth Token
Definition of Method in Top Level
How to Sign Out in a Rails App, Using Devise Gem, No Route Matches /Users/Sign_Out
Heroku-18: Git Push Fails. Showing Different Versions of Ruby on Push
Ruby Stack Failed to Deploy on Google Developers Console