Grab a Number After a String in a File

How to get a number from a text file after a specific string in one of the lines somewhere on the line?

Here is an example code on how to process not really fail safe, but hopefully good enough for your purpose, the lines read from a text file to get the next argument string after a well-known argument string like -param.

@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "DataFile=%~dp0my_file.txt"

rem Does the input data file exist?
if exist "%DataFile%" goto ProcessData
rem Input data file not found in directory of the batch file.
echo ERROR: Could not find file: "%DataFile%"
goto :EOF

:ProcessData
set "ParamValue="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamValue (
if /I "%%~J" == "-param" set "ParamValue=1"
) else (set "ParamValue=%%~J" & goto HaveValue)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "-param"
goto :EOF

:HaveValue
rem Output the parameter value as an example for further command lines.
set ParamValue

endlocal

The outer FOR loop reads non-empty lines one after the other from text file and assigns each line completely to the specified loop variable I.

The inner FOR loop processes the current line similar to how cmd.exe processes the argument strings passed to a batch file. All space/tab/comma/semicolon/equal sign/non-breaking space (in OEM encoding) delimited strings are ignored until a string is found which is case-insensitive equal the string -param. The next string in the current line is assigned to the environment variable ParamValue and the two loops are exited with the command GOTO to continue batch file processing on the line below the label :HaveValue where the environment variable ParamValue can be used for whatever purpose.

This extended version of above gets first the string after -param which is in the example 3. Then the entire text file is searched again for an argument string starting with -param and the string appended which was read first from file which is in the example -param3. If this string is found, the next string is assigned to environment variable ParaValue which is 2 in the example.

@echo off
set "DataFile=%~dp0my_file.txt"

rem Does the input data file exist?
if exist "%DataFile%" goto ProcessData
rem Input data file not found in directory of the batch file.
echo ERROR: Could not find file: "%DataFile%"
goto :EOF

:ProcessData
set "ParamName="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamName (
if /I "%%~J" == "-param" set "ParamName=1"
) else (set "ParamName=-param%%~J" & goto HaveName)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "-param"
goto :EOF

:HaveName
set "ParamValue="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamValue (
if /I "%%~J" == "%ParamName%" set "ParamValue=1"
) else (set "ParamValue=%%~J" & goto HaveValue)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "%ParamName%"
goto :EOF

:HaveValue
rem Output the parameter value as an example for further command lines.
set ParamValue

endlocal

For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.

  • call /? ... explains %~dp0 ... batch file path ending always with a backslash.
  • echo /?
  • endlocal /?
  • for /?
  • goto /?
  • if /?
  • rem /?
  • set /?
  • setlocal /?

Find number after a substring in string

Regex: char(\d+)

Details:

  • \d Matches a digit (equal to [0-9])
  • + Matches between one and unlimited times
  • () Capturing group

Python code:

String formatting syntax "%s" % var.The %s token allows to insert (and potentially format) a string.

def find_number(text, c):
return re.findall(r'%s(\d+)' % c, text)

find_number('abce123de34', 'e') >> ['123', '34']
find_number('abce123de34', 'de') >> ['34']

Regex to Find a number after a string

You can use a positive lookbehind (?<=) to find a value that follows something else. Here you want a the digit(s) that follow the string ms:. That would look like (?<=ms:)\d+:

import re

s = '?({mvp:375760,ms:6})'

re.search(r'(?<=ms:)\d+', s).group()
# '6'

or if there could be more than one:

re.findall(r'(?<=ms:)\d', s)
# ['6']

Regex to get any numbers after the occurrence of a string in a line

/(?<!\p{L})([Mm]ilk)(?!p{L})\D*(\d+)/

This matches the following strings, with the match and the contents of the two capture groups noted.

"The Milk99"             # "Milk99"     1:"Milk" 2:"99" 
"The milk99 is white" # "milk99" 1:"milk" 2:"99"
"The 8 milk is 99" # "milk is 99" 1:"milk" 2:"99"
"The 8milk is 45 or 73" # "milk is 45" 1:"milk" 2:"45"

The following strings are not matched.

"The Milk is white"
"The OJ is 99"
"The milkman is 37"
"Buttermilk is 99"
"MILK is 99"

This regular expression could be made self-documenting by writing it in free-spacing mode:

/
(?<!\p{L}) # the following match is not preceded by a Unicode letter
([Mm]ilk) # match 'M' or 'm' followed by 'ilk' in capture group 2
(?!p{L}) # the preceding match is not followed by a Unicode letter
\D* # match zero or more characters other than digits
(\d+) # match one or more digits in capture group 2
/x # free-spacing regex definition mode

\D* could be replaced with .*?, ? making the match non-greedy. If the greedy variant were used (.*), the second capture group for "The 8milk is 45 or 73" would contain "3".

To match "MILK is 99", change ([Mm]ilk) to (?i)(milk).

Extract number between two string in log file with awk

You can do this with sed, using -n to disable printing by default:

sed -n 's/.*callee_num:<<"\([+0-9]*\)">.*/\1/p' file

When the pattern matches, the part between the double quotes is captured and used in the replacement, discarding the rest of the line.

Of course, it is possible with awk too:

awk 'sub(/.*callee_num:<<"/, "") && sub(/">.*/, "")' file

This prints any lines where the two substitutions are successful. Unlike the version using sed, it doesn't check whether the part in between the quotes is numeric. If you wanted, you could add in a further check like this:

awk 'sub(/.*callee_num:<<"/, "") && sub(/">.*/, "") && /^[+0-9]+$/' file

This ensures that after the two substitutions are made, all that you are left with is a mixture of + and digits from 0 to 9.

The problem with your attempt using awk is that your field separator can be ", which would make the second field conxa3.

Python extract numbers after a specific string has appeared

Yes, there's a way. I'd recommend you read the file backwards, find the first occurrence of tea, then break and parse the next file. My solution assumes your file is fitting into the memory. Most probably this can take a while to read large files

You can read a file from end by doing:

for line in reversed(list(open("filename"))):
print(line.rstrip())

Now, to get only the desired tea cups you can do:

cups = []
for line in reversed(list(open("filename"))):
if "Tea cups" in line.rstrip():
cups.append(line.rstrip().split()[2])
break
print(cups)


Related Topics



Leave a reply



Submit