Sed Extracting Group of Digits

sed extracting group of digits

$ echo "This is an example: 65 apples" | sed -r  's/^[^0-9]*([0-9]+).*/\1/'
65

Sed command to extract group of digits

You can do this by substituting. This RegEx will match the entire line, capture the numbers and place the numbers back down.

Assuming that your lines are in a file called file.txt:

sed 's/.*(\([0-9]\+\))/\1/' file.txt

  • s - Substitute
  • / - Delimiter
  • .* - Match 0-Many of any character
  • ( - Match the bracket
  • \( - Start of a capture group
  • [0-9]\+ - Match (capture) 1-Many of any digits in the range of 0 to 9
  • \) - Close the capture group
  • ) - Match the final bracket
  • / - Delimiter
  • \1 - Print the first capture group in place of the previous RegEx match
  • / - Delimiter

Extracting numbers with sed

You may use this sed:

sed -E 's/^[^0-9]*([0-9]{1,6}).*/\1/' file

201909
202012
202012
201903
201903
201903
201903
202003

RegEx Explained:

  • -E: Enable extender regex mode (ERE)
  • ^: Start
  • [^0-9]*: Match 0 or more non-digits
  • ([0-9]{1,6}): Match 1 to 6 digits in 1st capture group
  • .*: Match 0 or more of any characters

SED command to extract 4-digit numbers from string

echo 'abis02 - GBS API 8085 is running abis02 - GBS API 8180 is running abis - GBS API 8181 is running' | grep -Eo '\b[0-9]{4}\b'
8085
8180
8181

sed: match a group of digits before a word

You can use

sed -r 's/.*\b([0-9]+).*/\1/'

\b matches a word boundary (beginning or end of a word).

How do I output only a capture group with sed

You are missing the regex after #. This should solve it:

$ sed -nE "s/(^pytest.+)#.*/\1/p" ./requirements/local.txt

Extract numbers from a string using sed and regular expressions

is this ok?

sed -r 's/.*_([0-9]*)\..*/\1/g'

with your example:

kent$   echo "./pentaray_run2/Trace_220560.dat"|sed -r 's/.*_([0-9]*)\..*/\1/g'
220560

How can I output only captured groups with sed?

The key to getting this to work is to tell sed to exclude what you don't want to be output as well as specifying what you do want. This technique depends on knowing how many matches you're looking for. The grep command below works for an unspecified number of matches.

string='This is a sample 123 text and some 987 numbers'
echo "$string" | sed -rn 's/[^[:digit:]]*([[:digit:]]+)[^[:digit:]]+([[:digit:]]+)[^[:digit:]]*/\1 \2/p'

This says:

  • don't default to printing each line (-n)
  • exclude zero or more non-digits
  • include one or more digits
  • exclude one or more non-digits
  • include one or more digits
  • exclude zero or more non-digits
  • print the substitution (p) (on one line)

In general, in sed you capture groups using parentheses and output what you capture using a back reference:

echo "foobarbaz" | sed 's/^foo\(.*\)baz$/\1/'

will output "bar". If you use -r (-E for OS X) for extended regex, you don't need to escape the parentheses:

echo "foobarbaz" | sed -r 's/^foo(.*)baz$/\1/'

There can be up to 9 capture groups and their back references. The back references are numbered in the order the groups appear, but they can be used in any order and can be repeated:

echo "foobarbaz" | sed -r 's/^foo(.*)b(.)z$/\2 \1 \2/'

outputs "a bar a".

If you have GNU grep:

echo "$string" | grep -Po '\d+'

It may also work in BSD, including OS X:

echo "$string" | grep -Eo '\d+'

These commands will match any number of digit sequences. The output will be on multiple lines.

or variations such as:

echo "$string" | grep -Po '(?<=\D )(\d+)'

The -P option enables Perl Compatible Regular Expressions. See man 3 pcrepattern or man 3 pcresyntax.



Related Topics



Leave a reply



Submit