How to Grep for Value in a Key-Value Store from Plain Text

How to grep for value in a key-value store from plain text

Use a look behind:

$ grep -Po '(?<=^FOO=)\w*$' file
foo

I also like awk for it:

$ awk -v FS="FOO=" 'NF>1{print $2}' file
foo

Or even better:

$ awk -F= -v key="FOO" '$1==key {print $2}' file
foo

With sed:

$ sed -n 's/^FOO=//p' file
foo

Or even with Bash -ONLY if you are confident about the file not containing any weird values-, you can source the file and echo the required value:

$ (source file; echo "$FOO")
foo

grep a file to read a key:value

awk is right tool for this as your data is delimited by a common character and structured in columns and rows. You may use this awk command:

awk -F: '$1 == "1234-A0"{print $2}' file

1234_12345678_987

Extract value from a list of key-value pairs using grep

You may use

grep -oP "(?:^|,)$KEY:\K[^,]+"

The -o option outputs matches. -P enables PCRE engine. The double quotes are necessary for string interpolation so that $KEY could be interpolated.

The pattern matches:

  • (?:^|,) - start of string or comma
  • $KEY - the KEY variable
  • : - colon
  • \K - match reset operator that discards the whole text matched so far
  • [^,]+ - 1+ chars other than ,

How to extract specific key value pairs from a grep output

I would do it using GNU AWK following way. Let file.txt content be

./Data1/TEST_Data1.xml:<def-query collection="FT_R1Event" count="-1" desc="" durationEnd="1" durationStart="0" durationType="CAL" fromWS="Data1" id="_q1" timeUnit="D">

./Data2/TEST_Data2.xml:<def-query collection="FT_R2Event" count="-1" desc="" durationEnd="2" durationStart="0" durationType="ABS" fromWS="Data2" id="_q1" timeUnit="M">

then

awk 'BEGIN{OFS=", ";FPAT="(^[^ ]+xml)|((durationEnd|timeUnit)=\"[^\"]+\")"}{gsub(/\.([/]|xml)/, "", $1);print}' file.txt

output

Data1/TEST_Data1, durationEnd="1", timeUnit="D"

Data2/TEST_Data2, durationEnd="2", timeUnit="M"

Explanation: I used FPAT to extract interesting elements of input, namely these which from start does not contain spaces and are following by xml or ((durationEnd or timeUnit) followed by " non-" "). Then I remove . followed by / or xml (note that . has to be literal . so it is escaped). Then I print everything, which is joined by , as I set it as output field seperator (OFS).

Disclaimer: I tested it only with shown samples.

(tested in gawk 4.2.1)

Find the value of key from JSON

If you have a grep that can do Perl compatible regular expressions (PCRE):

$ grep -Po '"id": *\K"[^"]*"' infile.json
"4dCYd4W9i6gHQHvd"
  • -P enables PCRE
  • -o retains nothing but the match
  • "id": * matches "id" and an arbitrary amount of spaces
  • \K throws away everything to its left ("variable size positive look-behind")
  • "[^"]*" matches two quotes and all the non-quotes between them

If your grep can't do that, you an use

$ grep -o '"id": *"[^"]*"' infile.json | grep -o '"[^"]*"$'
"4dCYd4W9i6gHQHvd"

This uses grep twice. The result of the first command is "id": "4dCYd4W9i6gHQHvd"; the second command removes everything but a pair of quotes and the non-quotes between them, anchored at the end of the string ($).

But, as pointed out, you shouldn't use grep for this, but a tool that can parse JSON – for example jq:

$ jq '.data.id' infile.json
"4dCYd4W9i6gHQHvd"

This is just a simple filter for the id key in the data object. To get rid of the double quotes, you can use the -r ("raw output") option:

$ jq -r '.data.id' infile.json
4dCYd4W9i6gHQHvd

jq can also neatly pretty print your JSON:

$ jq . infile.json
{
"data": {
"name": "test",
"id": "4dCYd4W9i6gHQHvd",
"domains": [
"www.test.domain.com",
"test.domain.com"
],
"serverid": "bbBdbbHF8PajW221",
"ssl": null,
"runtime": "php5.6",
"sysuserid": "4gm4K3lUerbSPfxz",
"datecreated": 1474597357
},
"actionid": "WXVAAHQDCSILMYTV"
}

Grep the entire text after a certain word using grep/awk/sed

With your shown attempts, please try following code.

your_API_command | 
awk -v RS= 'match($0,/-+BEGIN.*END RSA PRIVATE KEY-+/){print substr($0,RSTART,RLENGTH)}'

Explanation: Simple explanation would be, run your API command and send its output as an standard input to awk command. Where using nullify RS then using match function to match string from -(1 or more occurrences) followed by BEGIN till string END RSA PRIVATE KEY followed by 1 or more occurrences of -.

2nd solution: A little tweaked form of 1st solution here, written and tested in GNU awk.

your_API_command | awk -v RS='-+BEGIN.*END RSA PRIVATE KEY-+' 'RT{print RT}' 

awk command to read a key value pair from a file

Since there are multiple : in your input, getting $2 will not work in awk because it will just give you 2nd field. You actually need an equivalent of cut -d: -f2- but you also need to check key name that comes before first :.

This awk should work for you:

awk -F: '$1 == "GOOGLE_URL" {sub(/^[^:]+:/, "");  print}' input.txt

https://www.google.com/

Or this non-regex awk approach that allows you to pass key name from command line:

awk -F: -v k='GOOGLE_URL' '$1==k{print substr($0, length(k FS)+1)}' input.txt

Or using gnu-grep:

grep -oP '^GOOGLE_URL:\K.+' input.txt

https://www.google.com/


Related Topics



Leave a reply



Submit