Save Modifications in Place With Awk

Save modifications in place with awk

In GNU Awk 4.1.0 (released 2013) and later, it has the option of "inplace" file editing:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

Example usage:

$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3

To keep the backup:

$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3

Save modifications in place with NON GNU awk

Since main aim of this thread is how to do inplace SAVE in NON GNU awk so I am posting first its template which will help anyone in any kind of requirement, they need to add/append BEGIN and END section in their code keeping their main BLOCK as per their requirement and it should do the inplace edit then:

NOTE: Following will write all its output to output_file, so in case you want to print anything to standard output please only add print... statement without > (out) in following.

Generic Template:

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
.....your main block code.....
}
END{
if(rename){
system(rename)
}
}
' *.txt


Specific provided sample's solution:

I have come up with following approach within awk itself (for added samples following is my approach to solve this and save output into Input_file itself)

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print FNR > (out)
}
END{
if(rename){
system(rename)
}
}
' *.txt

NOTE: this is only a test for saving edited output into Input_file(s) itself, one could use its BEGIN section, along with its END section in their program, main section should be as per the requirement of specific question itself.

Fair warning: Also since this approach makes a new temporary out file in path so better make sure we have enough space on systems, though at final outcome this will keep only main Input_file(s) but during operations it needs space on system/directory



Following is a test for above code.

Execution of program with an example: Lets assume following are the .txt Input_file(s):

cat << EOF > test1.txt
onetwo three
tets testtest
EOF

cat << EOF > test2.txt
onetwo three
tets testtest
EOF

cat << EOF > test3.txt
onetwo three
tets testtest
EOF

Now when we run following code:

awk -v out_file="out" '
FNR==1{
close(out)
out=out_file count++
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047"
}
{
print "new_lines_here...." > (out)
}
END{
if(rename){
system("ls -lhtr;" rename)
}
}
' *.txt

NOTE: I have place ls -lhtr in system section intentionally to see which output files it is creating(temporary basis) because later it will rename them into their actual name.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out2
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out1
-rw-r--r-- 1 runner runner 38 Dec 9 05:33 out0

When we do a ls -lhtr after awk script is done with running, we could see only .txt files in there.

-rw-r--r-- 1 runner runner  27 Dec  9 05:33 test2.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test1.txt
-rw-r--r-- 1 runner runner 27 Dec 9 05:33 test3.txt


Explanation: Adding a detailed explanation of above command here:

awk -v out_file="out" '                                    ##Starting awk program from here, creating a variable named out_file whose value SHOULD BE a name of files which are NOT present in our current directory. Basically by this name temporary files will be created which will be later renamed to actual files.
FNR==1{ ##Checking condition if this is very first line of current Input_file then do following.
close(out) ##Using close function of awk here, because we are putting output to temp files and then renaming them so making sure that we shouldn't get too many files opened error by CLOSING it.
out=out_file count++ ##Creating out variable here, whose value is value of variable out_file(defined in awk -v section) then variable count whose value will be keep increment with 1 whenever cursor comes here.
rename=(rename?rename ORS:"") "mv \047" out "\047 \047" FILENAME "\047" ##Creating a variable named rename, whose work is to execute commands(rename ones) once we are done with processing all the Input_file(s), this will be executed in END section.
} ##Closing BLOCK for FNR==1 condition here.
{ ##Starting main BLOCK from here.
print "new_lines_here...." > (out) ##Doing printing in this example to out file.
} ##Closing main BLOCK here.
END{ ##Starting END block for this specific program here.
if(rename){ ##Checking condition if rename variable is NOT NULL then do following.
system(rename) ##Using system command and placing renme variable inside which will actually execute mv commands to rename files from out01 etc to Input_file etc.
}
} ##Closing END block of this program here.
' *.txt ##Mentioning Input_file(s) with their extensions here.

awk to update existing file

You are probably looking for in-place edit for modifying the same file as mentioned in the duplicate. Your attempt could never work, awk .. file > file because the shell processes the re-directions even before running the actual command, so > file actually truncates the file, because of an empty re-direction. So the awk could never see the value of $9 in the file.

You probably need mktemp which creates a random filename string under a temporary path in your filesystem. You could re-direct the command output to such a file and move it back to the original file

awk '/^-/ {print $9}' outfile.log >tmpfile && mv tmpfile outfile.log

Using mktemp would resolve a potential overwrite/deletion of file if you have a filename tmpfile in your current directory.

tmpfile="$(mktemp)"
awk '/^-/ {print $9}' outfile.log > "$tmpfile" && mv "$tmpfile" outfile.log

If you use GNU awk, you can write

gawk -i inplace '...' file

This is documented in the gawk manual.

How to search/replace a single inline with sed/awk?

Just move the final print outside of the filtered pattern. eg:

gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")} {print}' 

usually, that is simplified to:

gawk '/private/{gsub(/\//, "_"); gsub(/-/, "_")}1' 

You really, really, really, (emphasis on "really") do not want to use something like sed -i to edit the files "in-place". (I put "in-place" in quotes, because gnu's sed does not edit the files in place, but creates new files with the same name.) Doing so is a recipe for data corruption, and if you have a lot of files you don't want to take that risk. Just write the files into a new directory tree. It will make recovery much simpler.

eg:

d=backup/$(dirname "$filename")
mkdir -p "$d"
awk '...' "$filename" > "$d/$filename"

Consider if you used something like -i which puts backup files in the same directory structure. If you're modifying files in bulk and the process is stopped half-way through, how do you recover? If you are putting output into a separate tree, recovery is trivial. Your original files are untouched and pristine, and there are no concerns if your filtering process is terminated prematurely or inadvertently run multiple times. sed -i is a plague on humanity and should never be used. Don't spread the plague.

Save modifications in place with awk

In GNU Awk 4.1.0 (released 2013) and later, it has the option of "inplace" file editing:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

Example usage:

$ gawk -i inplace '{ gsub(/foo/, "bar") }; { print }' file1 file2 file3

To keep the backup:

$ gawk -i inplace -v INPLACE_SUFFIX=.bak '{ gsub(/foo/, "bar") }
> { print }' file1 file2 file3

AWK to replace HTML tag with another and keep text

I would use GNU sed for this task following way, let file.txt content be

<span class="desc e-font-family-cond">fork</span>

then

sed -e 's/<span[^>]*>/<strong>/g' -e 's/<\/span>/<\/strong>/g' file.txt

output

<strong>fork</strong>

Explanation: firstly replace span starting using <strong>, secondly replace span closing using </strong>.

How to sum two columns and save the values to third column using Linux shell command

You can use

awk -F, '{print $0 OFS $1+$2}' OFS=, file > newfile
awk 'BEGIN{FS=OFS=","} {print $0 OFS $1+$2}' file > newfile
awk -F, '$0=$0FS$1+$2' file > newfile

See an online demo.

With -F,/OFS=, (or BEGIN{FS=OFS=","}) you set the input and output field separator to a comma, and with print $0 OFS $1+$2 you output the line plus the comma and the sum of the two filed values.

how to write finding output to same file using awk command

Not possible per se. You need a second temporary file because you can't read and overwrite the same file. Something like:

awk '(PROGRAM)' testfile.txt > testfile.tmp && mv testfile.tmp testfile.txt

The mktemp program is useful for generating unique temporary file names.

There are some hacks for avoiding a temporary file, but they rely mostly on caching and read buffers and quickly get unstable for larger files.



Related Topics



Leave a reply



Submit