Save Modifications in Place with Non Gnu Awk

Save modifications in place with NON GNU awk

Since main aim of this thread is how to do inplace SAVE in NON GNU awk so I am posting first its template which will help anyone in any kind of requirement, they need to add/append BEGIN and END section in their code keeping their main BLOCK as per their requirement and it should do the inplace edit then:

NOTE: Following will write all its output to output_file, so in case you want to print anything to standard output please only add print... statement without > (out) in following.

Generic Template:

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
.....your main block code.....
}
END{
if(rename){
system(rename)
}
}
' *.txt


Specific provided sample's solution:

I have come up with following approach within awk itself (for added samples following is my approach to solve this and save output into Input_file itself)

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print FNR > (out)
}
END{
if(rename){
system(rename)
}
}
' *.txt

NOTE: this is only a test for saving edited output into Input_file(s) itself, one could use its BEGIN section, along with its END section in their program, main section should be as per the requirement of specific question itself.

Fair warning: Also since this approach makes a new temporary out file in path so better make sure we have enough space on systems, though at final outcome this will keep only main Input_file(s) but during operations it needs space on system/directory



Following is a test for above code.

Execution of program with an example: Lets assume following are the .txt Input_file(s):

cat << EOF > test1.txt
onetwo three
tets testtest
EOF

cat << EOF > test2.txt
onetwo three
tets testtest
EOF

cat << EOF > test3.txt
onetwo three
tets testtest
EOF

Now when we run following code:

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print "new_lines_here...." > (out)
}
END{
if(rename){
system("ls -lhtr;" rename)
}
}
' *.txt

NOTE: I have place ls -lhtr in system section intentionally to see which output files it is creating(temporary basis) because later it will rename them into their actual name.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out2
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out1
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out0

When we do a ls -lhtr after awk script is done with running, we could see only .txt files in there.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt


Explanation: Adding a detailed explanation of above command here:

awk -v out_file="out" '                                    ##Starting awk program from here, creating a variable named out_file whose value SHOULD BE a name of files which are NOT present in our current directory. Basically by this name temporary files will be created which will be later renamed to actual files.
FNR==1{ ##Checking condition if this is very first line of current Input_file then do following.
close(out) ##Using close function of awk here, because we are putting output to temp files and then renaming them so making sure that we shouldn't get too many files opened error by CLOSING it.
out=out_file count++ ##Creating out variable here, whose value is value of variable out_file(defined in awk -v section) then variable count whose value will be keep increment with 1 whenever cursor comes here.
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047" ##Creating a variable named rename, whose work is to execute commands(rename ones) once we are done with processing all the Input_file(s), this will be executed in END section.
} ##Closing BLOCK for FNR==1 condition here.
{ ##Starting main BLOCK from here.
print "new_lines_here...." > (out) ##Doing printing in this example to out file.
} ##Closing main BLOCK here.
END{ ##Starting END block for this specific program here.
if(rename){ ##Checking condition if rename variable is NOT NULL then do following.
system(rename) ##Using system command and placing renme variable inside which will actually execute mv commands to rename files from out01 etc to Input_file etc.
}
} ##Closing END block of this program here.
' *.txt ##Mentioning Input_file(s) with their extensions here.

Save modifications in place with awk

In GNU Awk 4.1.0 (released 2013) and later, it has the option of "inplace" file editing:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

Example usage:

$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3

To keep the backup:

$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3

Problem with the save changes in the same file with awk

tmp is a variable, you have to actually set it to the name of a file before trying to access that file with $tmp:

tmp=$(mktemp) || exit 1
name='eq6'
for index in {1..10}
do
awk 'f;/hbonds_Other-SOL/{f=1}' "${name}_$index.ndx" > "$tmp" && mv "$tmp" "${name}_$index.ndx"
done

You also had {$name} instead of ${name} but I assume that was a typo.

How to search/replace a single inline with sed/awk?

Just move the final print outside of the filtered pattern. eg:

gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")} {print}' 

usually, that is simplified to:

gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")}1' 

You really, really, really, (emphasis on "really") do not want to use something like sed -i to edit the files "in-place". (I put "in-place" in quotes, because gnu's sed does not edit the files in place, but creates new files with the same name.) Doing so is a recipe for data corruption, and if you have a lot of files you don't want to take that risk. Just write the files into a new directory tree. It will make recovery much simpler.

eg:

d=backup/$(dirname "$filename")
mkdir -p "$d"
awk '...' "$filename" > "$d/$filename"

Consider if you used something like -i which puts backup files in the same directory structure. If you're modifying files in bulk and the process is stopped half-way through, how do you recover? If you are putting output into a separate tree, recovery is trivial. Your original files are untouched and pristine, and there are no concerns if your filtering process is terminated prematurely or inadvertently run multiple times. sed -i is a plague on humanity and should never be used. Don't spread the plague.

Update tab-delimited file in-place with gawk

awk 'BEGIN {print  "Chr\tStart\tEnd\tGene"}1' file > newFile && mv newFile file

Output

Chr     Start   End     Gene
chr7 121738788 121738930 AASS
chr7 121738788 121738930 AASS
chr7 121738788 121738930 AASS

As it seems you're mostly interested in adding a header line, just print that before anything happens (via the BEGIN block). The 1 is a "true" statement, so all lines of input are printed (by default). You could replace it with the long hand {print $0} if you want code that non awk-gurus will understand.

Even using a -i inplace option, the program is doing the same as awk 'code' file > newFile && mv newFile file behind the scenes, so there is no "savings" in processing to adding a header to a file. The file has to be rewritten in either case.

IHTH

GAWK Print string to STDOUT with -i inplace option

Here is a workaround; drop -i inplace from the command line (not an obligatory though, see -e/-f) and place following at the very beginning of your script. Before starting to process a file's content, this will disable inplace temporarily and print FILENAME. Then inplace's BEGINFILE rule will enable itself again.

BEGINFILE {
if (inplace::filename != "") {
inplace::end(inplace::filename, inplace::suffix)
inplace::filename = ""
}
print FILENAME
}

@include "inplace"

See how inplace is implemented for a better understanding.

combine 2 awk or sed statements into one and save the existing file

You can inline multi-line awk script, or you can put the two statements in a file. Use distinct variable names for each 'pass'

awk '
$1 ~ /^data/ {
$0 = sprintf("%s%*s%s\n",substr($0,1,m-1),n-m,"'$fifteen_min_ago'",substr($0,n))
$0 = sprintf("%s%*s%s\n",substr($0,1,m2-1),n2-m2,"'$ctime'",substr($0,n2))
}
{ print }
' m=51 n=80 m2=97 n2=126 file > file.new &&
mv file.new

Note that there are other (simpler) ways to achieve he replacement that is implemented in the question. This is the most similar to the approach described in the question.

With SED the replacement are easier:

ctime=$(date +%Y-%m-%dT%H:%M:%S.%3N-00:00)
fifteen_min_ago=$(date -d "15 mins ago" +%Y-%m-%dT%H:%M:%S.%3N-00:00)

sed -e 's/"start_date": "[^"]*"/"start_date": "'$ctime'"/' \
-e 's/"end_date": "[^"]*"/"end_date": "'$fifteen_min_ago'"/' < file > file.new && mv file.new file

awk change once per file

awk -i inplace '$1=="namespace" && !seen[ARGIND]++ {$0=$0 ORS "foo"} 1' *

FS=" " is the default, no need to specify it explicitly.



Related Topics



Leave a reply



Submit