How to Extract a Substring Using Regex

How to extract a substring using regex

Assuming you want the part between single quotes, use this regular expression with a Matcher:

"'(.*?)'"

Example:

String mydata = "some string with 'the data i want' inside";
Pattern pattern = Pattern.compile("'(.*?)'");
Matcher matcher = pattern.matcher(mydata);
if (matcher.find())
{
System.out.println(matcher.group(1));
}

Result:


the data i want

How to extract a substring using regex in java

You can use this regex \{\$(.*?)\} with pattern like this :

String xmlnode = "<firstname id=\"{$person.id}\"> {$person.firstname} </firstname>";

Pattern pattern = Pattern.compile("\\{\\$(.*?)\\}");
Matcher matcher = pattern.matcher(xmlnode);

while (matcher.find()) {
System.out.println(matcher.group(1));
}

Note : you have to escape each character { $ } with \ because each one is special character in regex.

Outputs

person.id
person.firstname

How to extract a substring using a regular expression

To make sure that we have one of the answers we need to check what symbol is before it. To do so we can use Positive Lookbehind
like this (?<=Y)X. Idea is something like "find the X if there is a Y before it". Then all we need is just to take all the text not matching special chars like \, #, [ or ] using [^XYZ] that matches everything besides X, Y and Z.

To fix the issue from the comment we also need to check what comes next after the answer. There are 2 options: \ or ]. Now we are going to use Positive Lookahead which is like Lookbehind but checks the text after X. Example X(?=Y) means "find the X if there is a Y after it".

Final patterns are:

  • Wrong answers: (?<=\\|\[)[^\\#\[\]]+(?=\\|\])
  • Correct answers: (?<=#)[^\\#\[\]]+(?=\\|\])

How to extract a substring using a regular expression?

REGEX is a filter test, REPLACE is the extraction operation.

SELECT *
WHERE {
?s ?p ?o .
FILTER REGEX(?o, "\\{.*:(.*)\\}")
}

which tests ?o, and does not extract the () part.

Note the double \\.

To extract use BIND-REPLACE.

SELECT * {
?s ?p ?o .
BIND(REPLACE(?o, "^.*\\{.*:(.*)\\}.*$", "$1") AS ?substring)
}

In the general case, you may need str(?o) instead of ?o in functions.

How to extract a substring using this regex pattern? It's give a ValueError: too many values to unpack (expected 1)

let's update patterns string to a logical view and follow main feature.

regex_what_who = r"(que sabes|que sabias|que sabrias|que te referis|que te refieres|que te referias|que te habias referido|que habias referido|a que|que|quienes|quien|con que|con lo que|con la que|con|acerca de que|acerca de quienes|acerca de quien|sobre de que|sobre que|sobre de quienes|sobre quienes|sobre de quien|sobre quien|son|sean|es|serian|seria|iguales|igual|similares|similar|parecidos|parecido|comparables|comparable|asociables|asociable|distinguibles|distinguible|distintos|distinto|diferentes|diferente|diferenciables|diferenciable).*(a|del|de)\s*((?:\w+\s*)+)?"  

then, fix error first error in case if we got one result or many:

association, _temp = l.groups()    

It Work's! -)
it work's!

How to extract part of string in Bash using regex

This gnu sed should work with ignore case flag:

sed -E 's~^(.*/){0,1}((def|foo|bar)-[0-9]{2,6})-.*~\2~I' file

def-1234
foo-12
bar-12345

This sed matches:

  • (.*/){0,1}: Match a string upto / optionally at the start
  • (: Start capture group #2
    • (def|foo|bar): Match def or foo or bar
    • -: Match a -
    • [0-9]{2,6}: Match 2 to 6 digits
  • ): End capture group #2
  • -.*: Match - followed by anything till end
  • Substitution is value we capture in group #2

Or you may use this awk:

awk -v IGNORECASE=1 -F / 'match($NF, /^(def|foo|bar)-[0-9]{2,6}-/) {print substr($NF, 1, RLENGTH-1)}' file

def-1234
foo-12
bar-12345

Awk explanation:

  • -v IGNORECASE=1: Enable ignore case matching
  • -F /: Use / as field separator
  • match($NF, /^(def|foo|bar)-[0-9]{2,6}-/): Match text using regex ^(def|foo|bar)-[0-9]{2,6}- in $NF which is last field using / as field separator (to ignore text before /)
  • If match is successful then using substr print text from position 1 to RLENGTH-1 (since we matching until - after digits)

How to extract the substring between two markers?

Using regular expressions - documentation for further reference

import re

text = 'gfgfdAAA1234ZZZuijjk'

m = re.search('AAA(.+?)ZZZ', text)
if m:
found = m.group(1)

# found: 1234

or:

import re

text = 'gfgfdAAA1234ZZZuijjk'

try:
found = re.search('AAA(.+?)ZZZ', text).group(1)
except AttributeError:
# AAA, ZZZ not found in the original string
found = '' # apply your error handling

# found: 1234


Related Topics



Leave a reply



Submit