Regex Match Everything Up to First Period

Regex match everything up to first period


/^([^.]+)/

Let's break it down,

  • ^ is the newline anchor

  • [^.] this matches any character that's not a period

  • \+ to take until a period

And the expression is encapsulated with () to capture it.

Regex: matching up to the first occurrence of a character

You need

/^[^;]*/

The [^;] is a character class, it matches everything but a semicolon.

^ (start of line anchor) is added to the beginning of the regex so only the first match on each line is captured. This may or may not be required, depending on whether possible subsequent matches are desired.

To cite the perlre manpage:

You can specify a character class, by enclosing a list of characters in [] , which will match any character from the list. If the first character after the "[" is "^", the class matches any character not in the list.

This should work in most regex dialects.

Regex match back to a period or start of string

You don't need to use lookarounds to do that. The negated character class is your best friend:

(?:[^\s.][^.]*)?regex[^.]*\.?

or

[^.]*regex[^.]*\.?

this way you take any characters before the word "regex" and forbids any of these characters to be a dot.

The first pattern stripes white-spaces on the left, the second one is more basic.

About your pattern:

Don't forget that a regex engine tries to succeed at each position from the left to the right of the string. That's why something like (?:(?<=\.)|(?<=^)).*?regex doesn't always return the shortest substring between a dot or the start of the string and the word "regex", even if you use a non-greedy quantifier. The leftmost position always wins and a non-greedy quantifier takes characters until the next subpattern succeeds.

As an aside, one more time, the negated character class can be useful:
to shorten (?:(?<=\.)|(?<=^)) you can write (?<![^.])

Regex to match and group up to first period and last slash

You need to take the literal chars out of the capturing groups:

^([^.]*)\.(.*)\/(.*)$

See the regex demo and the regex graph:

Sample Image

Details:

  • ^ - string start
  • ([^.]*) - any zero or more chars other than a . char as many as possible
  • \. - a dot
  • (.*) - Group 2: any zero or more chars other than line break chars, as many as possible
  • \/ - a / char
  • (.*) - Group 3: any zero or more chars other than line break chars, as many as possible
  • $ - string end.

Python regex to get everything until the first dot in a string

By default all the quantifiers are greedy in nature. In the sense, they will try to consume as much string as they can. You can make them reluctant by appending a ? after them:

find = re.compile(r"^(.*?)\..*")

As noted in comment, this approach would fail if there is no period in your string. So, it depends upon how you want it to behave. But if you want to get the complete string in that case, then you can use a negated character class:

find = re.compile(r"^([^.]*).*")

it will automatically stop after encountering the first period, or at the end of the string.


Also you don't want to use re.match() there. re.search() should be just fine. You can modify your code to:

find = re.compile(r"^[^.]*")

for l in lines:
print re.search(find, l).group(0)

Demo on ideone

Regexp between first period and whitespace until next period

Thanks to the comments that enabled me to solve this problem. The final solution ended up being the following:

\A[^.]*\.\s\K[^.]*(?=\.)

Translation:

\A declares start of string

[^.]* says that you can have as many characters or whitespaces (as long as they are before the first period)

\.\s finds the first period and whitespace

\K then includes everything after that in the expression

[^.]* includes every character or whitespace until the first period

(?=\.) states to stop after the first period following the beginning (marked by \K)

Thanks again for the help!

Regex match until first instance of certain character

You added the " into the consuming part of the pattern, remove it.

^.+?(?=\")

Or, if you need to match any chars including line breaks, use either

(?s)^.+?(?=\")
^[\w\W]+?(?=\")

See demo. Here, ^ matches start of string, .+? matches any 1+ chars, as few as possible, up to the first " excluding it from the match because the "` is a part of the lookahead (a zero-width assertion).

In the two other regexps, (?s) makes the dot match across lines, and [\w\W] is a work-around construct that matches any char if the (s) (or its /s form) is not supported.

Best is to use a negated character class:

^[^"]+

See another demo. Here, ^[^"]+ matches 1+ chars other than " (see [^"]+) from the start of a string (^).



Related Topics



Leave a reply



Submit