How to get a number from a text file after a specific string in one of the lines somewhere on the line?
Here is an example code on how to process not really fail safe, but hopefully good enough for your purpose, the lines read from a text file to get the next argument string after a well-known argument string like -param
.
@echo off
setlocal EnableExtensions DisableDelayedExpansion
set "DataFile=%~dp0my_file.txt"
rem Does the input data file exist?
if exist "%DataFile%" goto ProcessData
rem Input data file not found in directory of the batch file.
echo ERROR: Could not find file: "%DataFile%"
goto :EOF
:ProcessData
set "ParamValue="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamValue (
if /I "%%~J" == "-param" set "ParamValue=1"
) else (set "ParamValue=%%~J" & goto HaveValue)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "-param"
goto :EOF
:HaveValue
rem Output the parameter value as an example for further command lines.
set ParamValue
endlocal
The outer FOR loop reads non-empty lines one after the other from text file and assigns each line completely to the specified loop variable I
.
The inner FOR loop processes the current line similar to how cmd.exe
processes the argument strings passed to a batch file. All space/tab/comma/semicolon/equal sign/non-breaking space (in OEM encoding) delimited strings are ignored until a string is found which is case-insensitive equal the string -param
. The next string in the current line is assigned to the environment variable ParamValue
and the two loops are exited with the command GOTO to continue batch file processing on the line below the label :HaveValue
where the environment variable ParamValue
can be used for whatever purpose.
This extended version of above gets first the string after -param
which is in the example 3
. Then the entire text file is searched again for an argument string starting with -param
and the string appended which was read first from file which is in the example -param3
. If this string is found, the next string is assigned to environment variable ParaValue
which is 2
in the example.
@echo off
set "DataFile=%~dp0my_file.txt"
rem Does the input data file exist?
if exist "%DataFile%" goto ProcessData
rem Input data file not found in directory of the batch file.
echo ERROR: Could not find file: "%DataFile%"
goto :EOF
:ProcessData
set "ParamName="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamName (
if /I "%%~J" == "-param" set "ParamName=1"
) else (set "ParamName=-param%%~J" & goto HaveName)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "-param"
goto :EOF
:HaveName
set "ParamValue="
for /F usebackq^ delims^=^ eol^= %%I in ("%DataFile%") do for %%J in (%%I) do (
if not defined ParamValue (
if /I "%%~J" == "%ParamName%" set "ParamValue=1"
) else (set "ParamValue=%%~J" & goto HaveValue)
)
rem The parameter of interest was not found at all or there is no value.
echo ERROR: Could not find the parameter with name: "%ParamName%"
goto :EOF
:HaveValue
rem Output the parameter value as an example for further command lines.
set ParamValue
endlocal
For understanding the used commands and how they work, open a command prompt window, execute there the following commands, and read entirely all help pages displayed for each command very carefully.
call /?
... explains%~dp0
... batch file path ending always with a backslash.echo /?
endlocal /?
for /?
goto /?
if /?
rem /?
set /?
setlocal /?
Find number after a substring in string
Regex: char(\d+)
Details:
\d
Matches a digit (equal to[0-9]
)+
Matches between one and unlimited times()
Capturing group
Python code:
String formatting syntax "%s" % var
.The %s
token allows to insert (and potentially format) a string.
def find_number(text, c):
return re.findall(r'%s(\d+)' % c, text)
find_number('abce123de34', 'e') >> ['123', '34']
find_number('abce123de34', 'de') >> ['34']
Regex to Find a number after a string
You can use a positive lookbehind (?<=)
to find a value that follows something else. Here you want a the digit(s) that follow the string ms:
. That would look like (?<=ms:)\d+
:
import re
s = '?({mvp:375760,ms:6})'
re.search(r'(?<=ms:)\d+', s).group()
# '6'
or if there could be more than one:
re.findall(r'(?<=ms:)\d', s)
# ['6']
Regex to get any numbers after the occurrence of a string in a line
/(?<!\p{L})([Mm]ilk)(?!p{L})\D*(\d+)/
This matches the following strings, with the match and the contents of the two capture groups noted.
"The Milk99" # "Milk99" 1:"Milk" 2:"99"
"The milk99 is white" # "milk99" 1:"milk" 2:"99"
"The 8 milk is 99" # "milk is 99" 1:"milk" 2:"99"
"The 8milk is 45 or 73" # "milk is 45" 1:"milk" 2:"45"
The following strings are not matched.
"The Milk is white"
"The OJ is 99"
"The milkman is 37"
"Buttermilk is 99"
"MILK is 99"
This regular expression could be made self-documenting by writing it in free-spacing mode:
/
(?<!\p{L}) # the following match is not preceded by a Unicode letter
([Mm]ilk) # match 'M' or 'm' followed by 'ilk' in capture group 2
(?!p{L}) # the preceding match is not followed by a Unicode letter
\D* # match zero or more characters other than digits
(\d+) # match one or more digits in capture group 2
/x # free-spacing regex definition mode
\D*
could be replaced with .*?
, ?
making the match non-greedy. If the greedy variant were used (.*
), the second capture group for "The 8milk is 45 or 73"
would contain "3"
.
To match "MILK is 99", change ([Mm]ilk)
to (?i)(milk)
.
Extract number between two string in log file with awk
You can do this with sed, using -n
to disable printing by default:
sed -n 's/.*callee_num:<<"\([+0-9]*\)">.*/\1/p' file
When the pattern matches, the part between the double quotes is captured and used in the replacement, discarding the rest of the line.
Of course, it is possible with awk too:
awk 'sub(/.*callee_num:<<"/, "") && sub(/">.*/, "")' file
This prints any lines where the two substitutions are successful. Unlike the version using sed, it doesn't check whether the part in between the quotes is numeric. If you wanted, you could add in a further check like this:
awk 'sub(/.*callee_num:<<"/, "") && sub(/">.*/, "") && /^[+0-9]+$/' file
This ensures that after the two substitutions are made, all that you are left with is a mixture of +
and digits from 0 to 9.
The problem with your attempt using awk is that your field separator can be "
, which would make the second field conxa3
.
Python extract numbers after a specific string has appeared
Yes, there's a way. I'd recommend you read the file backwards, find the first occurrence of tea, then break and parse the next file. My solution assumes your file is fitting into the memory. Most probably this can take a while to read large files
You can read a file from end by doing:
for line in reversed(list(open("filename"))):
print(line.rstrip())
Now, to get only the desired tea cups you can do:
cups = []
for line in reversed(list(open("filename"))):
if "Tea cups" in line.rstrip():
cups.append(line.rstrip().split()[2])
break
print(cups)
Related Topics
Unit Testing a Method With No Return Value
Unpivot Multiple Columns With Same Name in Pandas Dataframe
Visual Studio Code Windows , Python Pandas . No Module Named Pandas
Number of Common Letters in Two Strings
Drop Rows Containing Empty Cells from a Pandas Dataframe
Is There Any Numpy Group by Function
Import Local Module in Jupyter Notebook
How to Crop the Black Background of the Image Using Opencv in Python
How to Generate and Open an Outlook Email With Python (But Do Not Send)
How to Decompile a Compiled .Pyc File into a .Py File
Why Calling .Sort() Function on Pandas Series Sorts Its Values In-Place and Returns Nothing
How to Create Dynamic Workflows in Airflow
Background Color for Tk in Python
How to Retrieve Data from Dynamic Table - Selenium Python