Match Until Full Stop

Match until full stop

Your Pattern is not going to fit your requirements.

Here's what your Pattern parses for right now:

| literal dot
| | followed by any 1+ sequence reluctantly quantified
| | | followed by non-capturing 1 or 2
| | |
\\..+?(?=0|1)

By definition, non-capturing constructs cannot be back-referenced (i.e. you can never fetch their values by invoking Matcher#group).

And here's a simple example of what you way want instead:

String test = "this is a test 123. 1";
// | group 1: any 1+ char sequence reluctantly quantified,
// | | followed by a dot, non-capturing here
// | |
// | | | any character reluctantly quantified
// | | | (here, your whitespace)
// | | | | group 2: 1 or 2
Pattern p = Pattern.compile("(.+?)(?=\\.).*?([01])");
Matcher m = p.matcher(test);
if (m.find()) {
System.out.printf("Group 1: %s%nGroup 2: %s%n", m.group(1), m.group(2));
}

Output

Group 1: this is a test 123
Group 2: 1

Notes

  • Group 0 always represents the whole match.
  • In other words, the user-defined numbered groups (defined by content enclosed in parenthesis in the pattern) start at index 1.
  • See Groups and capturing section here.

  • It seems your requirements for parsing the final 0 / 1 digit are a bit lax. You may want to ask yourself whether this digit is going to be "isolated", e.g. surrounded by non-alnum characters, or possibly part of a larger digit sequence, etc. etc.

Match words and full stops, but not the trailing full stop

You may "decompose" the last [a.]+ pattern into [a.]*[a]:

(?:^|\s)\$(?!\d)[\w.[\]()]*[\w[\]()]
^^^^^^^^^^^^^^^^^^^^

See the regex demo.

Details

  • (?:^|\s) - a non-capturing group matching either start of string (^) or (|) a whitespace (\s)
  • \$ - a $ char
  • (?!\d) - a negative lookahead that fails the match if there is a digit right after the $ char
  • [\w.[\]()]* - zero or more word, ., [, ], ( or ) chars
  • [\w[\]()] - a word, ., [, ], ( or ) char.

regex expression to get all digits before full stop

You need to escape the .:

text_temp = re.findall(r"\d+\.", string)

since . is a special character in regex, which matches any character. Added the + also to match 1 or more digits.

 

Or if you actually are using 'FULLWIDTH FULL STOP' (U+FF0E) you can just use the special character in the regex without escaping it:

text_temp = re.findall(r"\d+.", string)

Simple Regex: match everything until the last dot

Use lookahead to assert the last dot character:

.*(?=\.)

Live demo.

R Regular expression for string containing full stops

Since you are only checking if the end of the string ends with ..t., you can eliminate ^.+ in your pattern.

The dot . in regular expression syntax is a character of special meaning which matches any character except a newline sequence. To match a literal dot or any other character of special meaning you need to escape \\ it.

> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('\\.{2}t\\.$', x)
# [1] 1 4

Or place that character inside of a character class.

> x <- c('foo..t.', 'w...gate', 'bar..t.foo', 'bar..t.')
> grep('[.]{2}t[.]$', x)
# [1] 1 4

Note: I used the range operator \\.{2} to match two dots instead of escaping it twice \\.\\.

Regex match everything up to first period

/^([^.]+)/

Let's break it down,

  • ^ is the newline anchor

  • [^.] this matches any character that's not a period

  • \+ to take until a period

And the expression is encapsulated with () to capture it.

Preg match full stop

Alternate and also fast then preg_match

if (strpos($furl, '.') !== false)
{
echo "Contains full stop";
}

Match full stop in a sentence using perl

Use look around :

$sen =~ s/(?<!\d)\.(?!\d)//g;

This will match a dot not preceded by a digit and not followed by a digit.

Updated according to OP's comment, this will remove dots that are followed by capital letter:

#!/usr/bin/perl
use Modern::Perl;
use utf8;

while(<DATA>) {
chomp;
s/\.(?=(?:\s*[A-Z])|$)//g;
# Or, if you want to be unicode compatible
s/\pP(?=(?:\s*\p{Lu})|$)//g;
say;
}

__DATA__
I'm going to match full.stop in sentence 3.142
I'm going to match full.Stop in sentence 3.142
I'm going to match full. Stop in sentence 3.142
I'm going to match full.stop in sentence 3.142. End of string.

output:

I'm going to match full.stop in sentence 3.142
I'm going to match fullStop in sentence 3.142
I'm going to match full Stop in sentence 3.142
I'm going to match full.stop in sentence 3.142 End of string


Related Topics



Leave a reply



Submit