Awk: Find and Replace in Certain Field Only

awk: find and replace in certain field only

Here's one way using awk:

awk '{ sub(/100$/, "2000", $3) }1' file

Results:

12 13 22000 s
12 13 32000 s
100 13 2000 s
12 13 300 s

AWK - replace specific column on matching line, then print other lines

You've almost got it. You're just overthinking things with your getline.

In awk, the following should work:

$ awk '/^>/ {$1=$1"A"} 1' file.txt

This works by running the command in curly braces on all lines that match the regular expression ^>. The 1 at the end is awk short-hand that says "print the current line".

Another option for a substitution this simple would be to use sed:

$ sed '/^>/s/ /A /' file.txt

This works by searching for lines that match the same regex, then replacing the first space with a string (/A /). sed will print each line by default, so no explicit print is required.

Or if you prefer something that substitutes the first "field" rather than the first "field separator", this can work:

$ sed 's/^\(>[^ ]*\)/\1A/' file.txt

By default, sed regexes are "BRE", so the grouping brackets need to be escaped. The \1 is a reference to the first (in this case "only") bracketed expression in the search regex.

How to replace character in certain column with awk

awk solution:

$ cat tst.awk
BEGIN{FS=OFS=";"}
NR>1 && sub(/m/,"x",$3){print $3, $4}

This will work on your real 250.000 lines file:

$ awk -f tst.awk file
"xagna";"aliqua"
"xinim";"veniam"
"ullaxco";"laboris

or, with a one-liner:

awk 'BEGIN{FS=OFS=";"} NR>1 && sub(/m/,"x",$3){print $3, $4}' file

Regex replace on specific column with SED/AWK

Using awk

awk is a good tool for this:

$ awk -F'\t' -v OFS='\t' 'NR>=2{sub(/^C/, "", $3)} 1' file
Organ K ClustNo Analysis
LN K200 12 Gene Ontology
LN K200 116 Gene Ontology
CN K200 2 Gene Ontology

How it works

  • -F'\t'

    Use tab as the field delimiter on input.

  • -v OFS='\t'

    Use tab as the field delimiter on output

  • NR>=2 {sub(/^C/, "", $3)}

    Remove the initial C from field 3 only for lines after the first line.

  • 1

    This is awk's cryptic shorthand for print-the-line.

Using sed

$ sed -r '2,$ s/(([^\t]+\t+){2})C/\1/' file
Organ K ClustNo Analysis
LN K200 12 Gene Ontology
LN K200 116 Gene Ontology
CN K200 2 Gene Ontology
  • -r

    Use extended regular expressions. (On Mac OSX or other BSD platform, use -E instead.)

  • 2,$ s/(([^\t]+\t){2})C/\1/

    This substitution is applied only for lines from 2 to the end of the file.

    (([^\t]+\t){2}) matches the first two tab-separated columns. This assumes that only one tab separates each column. Because the regex is enclosed in parens, what it matches will be available later as \1.

    C this match C.

    \1 replaces the matched text with just the first two columns, not the C..



Related Topics



Leave a reply



Submit