Is there way to delete duplicate header in a file in Unix?
If you know that the first line contains the header, just delete all other instances of that.
awk 'FNR==1 { header = $0; print }
$0 != header' file
If that won't work, please tell us how we can identify a header line. If it's just a static string, grep -vF 'that string'
or if it matches a particular regex, grep -v 'that regex'
.
Unix - removing duplicate headers from file
Try this with GNU sed:
sed '3,${/^Metric/d;/^---/d}' file
Output:
Metric date_sk date_sk -7
---------------- ---------- ----------
Test1 2015-10-19 2015-10-12
Test2 2015-10-19 2015-10-12
Test3 2015-10-19 2015-10-12
If you want to edit "in place" add sed's option -i
.
How to remove duplicate headers from a file except first occurrence in linux
Quite simple in Awk
, just include all the fields in the row as unique key,
awk '!unique[$1$2$3$4]++' file > new-file
which produces an output as
No name city country
1 xyz yyyy zzz
2 test dddd xxxx
3 xyz yyyy zzz
A more readable version in Awk
consisting of a loop upto the max fields in the row (loop upto NF
) would be to do
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > new-file
(or) a much readable version from Sundeep's comment below using $0
meaning the whole line contents
awk '!unique[$0]++' file
Follow-up question from OP to save the file in-place,
Latest versions of GNU Awk (since 4.1.0 released), have the option of "inplace" file editing:
[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "
sed -i
" feature. [...]
Example usage:
gawk -i inplace '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file
To keep the backup:
gawk -i inplace -v INPLACE_SUFFIX=.bak '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file
(or) if your Awk
does not support that, use shell built-ins
tmp=$(mktemp)
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > "$tmp" && mv "$tmp" file
Delete duplicate headers in awk
I'd do it this way:
sed '1h;2,$G;s/^\(.*\)\n\1$//;/./P;d' filename
How do I delete all lines in a concatenated text file that match the header WITHOUT deleting the header? [bash]
The following AWK script removes all lines that are exactly the same as the first one.
awk '{ if($0 != header) { print; } if(header == "") { header=$0; } }' inputfile > outputfile
It will print the first line because the initial value of header
is an empty string. Then it will store the first line in header
because it is empty.
After this it will print only lines that are not equal to the first one already stored in header
. The second if
will always be false once the header has been saved.
Note: If the file starts with empty lines these empty lines will be removed.
To remove the first number column you can use
sed 's/^[0-9][0-9]*[ \t]*//' inputfile > outputfile
You can combine both commands to a pipe
awk '{ if($0 != header) { print; } if(header == "") { header=$0; } }' inputfile | sed 's/^[0-9][0-9]*[ \t]*//' > outputfile
How to delete the rest of the records after a pattern which occurred for the second time in a .CSV file
Try this:
awk 'a~$0{exit}NR==1{a=$0}1' file
How to delete duplicate lines in a file without sorting it in Unix
awk '!seen[$0]++' file.txt
seen
is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0]
will evaluate to false. The !
is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.
The ++
increments seen
so that seen[$0] == 1
after the first time a line is found and then seen[$0] == 2
, and so on.
AWK evaluates everything but 0
and ""
(empty string) to true. If a duplicate line is placed in seen
then !seen[$0]
will evaluate to false and the line will not be written to the output.
Related Topics
Compare Md5 Sums in Bash Script
Is There an Scp Variant of Mv Command
Error Marking Master: Timed Out Waiting for the Condition [Kubernetes]
How to Replace a Multi Line String in a Bunch Files
Cron Error with Using Backquotes
How to Copy a File with '$' in Name in Linux
Ignore Case When Trying to Match File Names Using Find Command in Linux
How to List Recently Deleted Files from a Directory
How to Replace Finding Words with the Different in Each Occurrence in Vi/Vim Editor
Coqide 8.5: No Syntax Highlighting on Linux
Initiating Dynamic Variables (Variable Variables) in Bash Shell Script
How to Install Haskell Platform on Linux Debian Wheezy
Linux Sort Doesn't Work with Negative Float Numbers
What Really Is the "Linger Time" That Can Be Set with So_Linger on Sockets
Provide Password Using Shell Script
Prepend to Visual Block Not Working in Vim