Return String with First Match for a Regex, Handling Case Where There Is No Match

Return string with first match for a regex, handling case where there is no match

You could embed the '' default in your regex by adding |$:

>>> re.findall('\d+|$', 'aa33bbb44')[0]
'33'
>>> re.findall('\d+|$', 'aazzzbbb')[0]
''
>>> re.findall('\d+|$', '')[0]
''

Also works with re.search pointed out by others:

>>> re.search('\d+|$', 'aa33bbb44').group()
'33'
>>> re.search('\d+|$', 'aazzzbbb').group()
''
>>> re.search('\d+|$', '').group()
''

Extract only first match using python regular expression

Pretty simple:

In [8]: course_name
Out[8]: 'Post Graduate Certificate Programme in Retail Management (PGCPRM) (Online)'

In [9]: print re.sub('\([A-Z]+\)\s*', '', course_name)
Post Graduate Certificate Programme in Retail Management (Online)

In [17]: print re.search('\(([A-Z]+)\)\s*', course_name).groups()[0]
PGCPRM

Return Error if no match found by regex

As mentioned in the comments above, you cannot set anything in regex to do that for you, but you can check if the output returned by re.findall after applying the extra formatting is empty or not, and if it is empty, which means that no matches were found, return Error

import re
link = "http://www.this_is_my_perfect_url.com/blah_blah/blah_blah?=trololo"

def get_domain(url):
domain_regex = re.compile("\:\/\/(.*?)\/|$")

#Get regex matches into a list after data massaging
matches = re.findall(domain_regex, str(url))[0].replace('www.', '')

#Return the match or Error if output is empty
return matches or 'Error'

print(get_domain(link))
print(get_domain('there_is_no_domain_in_here'))

The output will be

this_is_my_perfect_url.com
Error

Take first word after a regex match

You need to match : after THttpServer and any non-word chars up to the word and match and capture it with (\w+).

E.g. you may use

THttpServer:\W*(\w+)

See the regex demo.

Details

  • THttpServer: - a literal substring
  • \W* - any 0+ non-word chars
  • (\w+) - Capturing group 1 (later accessible via m.group(1)): 1 or more word chars.

See the Python demo:

import re
strs = ['23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)',
'23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)']

rx = re.compile(r'THttpServer:\W*(\w+)')
for s in strs:
m = rx.search(s)
if m:
print("Found '{}' in '{}'.".format(m.group(1), s))

Output:

Found 'transportTCPChanged' in '23:25:04.805: INFO: THttpServer: transportTCPChanged(state: DISCONNECTED 2)'.
Found 'transportUDPOpened' in '23:25:13.120: INFO: THttpServer: transportUDPOpened(state: Port 54)'.

Regex - lazy match first pattern occurrence, but no subsequent matching patterns

Mine is similar to you except I allowed numbers like 30% (without decimal points)

\d+(\.\d+)?%

I don't know what language you are using, but in python for getting the first occurrence you can use re.search()

Here is an example:

import re

pattern = r'\d+(\.\d+)?%'
string = 'Profits in California were down 10.00% to $100.00, a decrease from 22.6% the prior year.'

print(re.search(pattern, string).group())

How do I return a string from a regex match in python?

You should use re.MatchObject.group(0). Like

imtag = re.match(r'<img.*?>', line).group(0)

Edit:

You also might be better off doing something like

imgtag  = re.match(r'<img.*?>',line)
if imtag:
print("yo it's a {}".format(imgtag.group(0)))

to eliminate all the Nones.

Return first occurrence of Regex not matched

You can only use another regex to check the text that made your previous validation regex fail to match. Use something like this:

var text = "abc(de192/+£,€.&";var pattern = /^[0-9a-zA-Z,+&.\/-]+$/;var res = pattern.test(text)if (!res) {   var m=text.match(/[^0-9a-zA-Z,+&.\/-]+/) || [""];   console.log(m[0]);}


Related Topics



Leave a reply



Submit