Is There Way to Delete Duplicate Header in a File in Unix

Is there way to delete duplicate header in a file in Unix?

If you know that the first line contains the header, just delete all other instances of that.

awk 'FNR==1 { header = $0; print }
$0 != header' file

If that won't work, please tell us how we can identify a header line. If it's just a static string, grep -vF 'that string' or if it matches a particular regex, grep -v 'that regex'.

Unix - removing duplicate headers from file

Try this with GNU sed:

sed '3,${/^Metric/d;/^---/d}' file

Output:


Metric date_sk date_sk -7
---------------- ---------- ----------
Test1 2015-10-19 2015-10-12
Test2 2015-10-19 2015-10-12
Test3 2015-10-19 2015-10-12

If you want to edit "in place" add sed's option -i.

How to remove duplicate headers from a file except first occurrence in linux

Quite simple in Awk, just include all the fields in the row as unique key,

awk '!unique[$1$2$3$4]++' file > new-file

which produces an output as

No name city country
1 xyz yyyy zzz
2 test dddd xxxx
3 xyz yyyy zzz

A more readable version in Awk consisting of a loop upto the max fields in the row (loop upto NF) would be to do

awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > new-file

(or) a much readable version from Sundeep's comment below using $0 meaning the whole line contents

awk '!unique[$0]++' file

Follow-up question from OP to save the file in-place,

Latest versions of GNU Awk (since 4.1.0 released), have the option of "inplace" file editing:

[...] The "inplace" extension, built using the new facility, can be used to simulate the GNU "sed -i" feature. [...]

Example usage:

gawk -i inplace '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

To keep the backup:

gawk -i inplace -v INPLACE_SUFFIX=.bak '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file

(or) if your Awk does not support that, use shell built-ins

tmp=$(mktemp) 
awk '{key=""; for(i=1;i<=NF;i++) key=key$i;}!unique[key]++' file > "$tmp" && mv "$tmp" file

Delete duplicate headers in awk

I'd do it this way:

sed '1h;2,$G;s/^\(.*\)\n\1$//;/./P;d' filename

How do I delete all lines in a concatenated text file that match the header WITHOUT deleting the header? [bash]

The following AWK script removes all lines that are exactly the same as the first one.

awk '{ if($0 != header) { print; } if(header == "") { header=$0; } }' inputfile > outputfile

It will print the first line because the initial value of header is an empty string. Then it will store the first line in header because it is empty.

After this it will print only lines that are not equal to the first one already stored in header. The second if will always be false once the header has been saved.

Note: If the file starts with empty lines these empty lines will be removed.

To remove the first number column you can use

sed 's/^[0-9][0-9]*[ \t]*//' inputfile > outputfile

You can combine both commands to a pipe

awk '{ if($0 != header) { print; } if(header == "") { header=$0; } }' inputfile | sed 's/^[0-9][0-9]*[ \t]*//' > outputfile

How to delete the rest of the records after a pattern which occurred for the second time in a .CSV file

Try this:

awk 'a~$0{exit}NR==1{a=$0}1' file

How to delete duplicate lines in a file without sorting it in Unix

awk '!seen[$0]++' file.txt

seen is an associative array that AWK will pass every line of the file to. If a line isn't in the array then seen[$0] will evaluate to false. The ! is the logical NOT operator and will invert the false to true. AWK will print the lines where the expression evaluates to true.

The ++ increments seen so that seen[$0] == 1 after the first time a line is found and then seen[$0] == 2, and so on.
AWK evaluates everything but 0 and "" (empty string) to true. If a duplicate line is placed in seen then !seen[$0] will evaluate to false and the line will not be written to the output.



Related Topics



Leave a reply



Submit