How to Merge Similar Lines in Linux

How to merge every two lines into one from the command line?

awk:

awk 'NR%2{printf "%s ",$0;next;}1' yourFile

note, there is an empty line at the end of output.

sed:

sed 'N;s/\n/ /' yourFile

how to merge similar lines in linux

Here is one way to do it:

awk ' {
last=$NF; $NF=""
if($0==previous) {
tail=tail " " last
}
else {
if(previous!="") {
if(split(tail,foo)==1) tail=tail " 0"
print previous tail
}
previous=$0
tail=last
}
}
END {
if(previous!="") print previous tail
}
'

How to merge two files line by line in Bash

You can use paste:

paste file1.txt file2.txt > fileresults.txt

Merge 16 lines into one line

Benchmarking six different merging methods,

for merging specific number of lines.

Basicaly, there are many commands:

pr - convert text files for printing

pr -at16 <file

Try:

pr -a -t -16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

xargs - build and execute command lines from standard input

... and executes the command (default is /bin/echo) ...

xargs -n 16 <file

Try:

xargs -n 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

paste - merge lines of files

printf -v pasteargs %*s 16
paste -d\ ${pasteargs// /- } < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

sed - stream editor for filtering and transforming text

printf -v sedstr 'N;s/\\n/ /;%.0s' {2..16};
sed -e "$sedstr" < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

awk - pattern scanning and processing language

awk 'NR%16{printf "%s ",$0;next;}1'  < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

But, you could use pure bash:

group=()
while read -r line;do
group+=("$line")
(( ${#group[@]} > 15 ))&&{
echo "${group[*]}"
group=()
}
done < <(seq 1 42) ; echo "${group[*]}"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

or as a function:

lgrp () { 
local group=() line
while read -r line; do
group+=("$line")
((${#group[@]}>=$1)) && {
echo "${group[*]}"
group=()
}
done
[ "$group" ] && echo "${group[*]}"
}

or

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=$1))&&{
echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}

then

lgrp 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

( Note: All this tests was arbitrarily done over 42 values, don't ask my why! ;-)

Other languages

Of course, by using any language, you could do same:

perl -ne 'chomp;$r.=$_." ";( 15 < ++$cnt) && do {
printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
};END{print $r."\n"}' < <(seq 1 42)

Like python, ruby, lisp, C, ...

Comparison of execution time.

Ok, there are more than 3 simple ways, let do a little bench.
How I do it:

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=
$1))&&{ echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}
export -f lgrp
printf -v sedcmd '%*s' 15
sedcmd=${sedcmd// /N;s/\\n/ /;}
export sedcmd
{
printf "%-12s\n" Method
printf %7s\\n count real user system count real user system

for cmd in 'paste -d " " -{,,,}{,,,}' 'pr -at16' \
'sed -e "$sedcmd"' \
$'awk \47NR%16{printf "%s ",$0;next;}1;END{print ""}\47'\
$'perl -ne \47chomp;$r.=$_." ";( 15 < ++$cnt) && do {
printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
};END{print $r."\n"}\47' 'lgrp 16' 'xargs -n 16'
do
printf %-12s\\n ${cmd%% *}
for length in 5042 50042; do
printf %7s\\n $(bash -c "TIMEFORMAT=$'%R %U %S';
time $cmd < <(seq 1 $length) | wc -l" 2>&1)
done
done
} | paste -d $'\t' -{,,,,,,,,}

(This could be cut'n paste in a bash terminal). Produce, on my computer:

Method        count    real    user  system   count    real    user  system
paste 316 0.002 0.002 0.000 3128 0.003 0.003 0.000
pr 316 0.003 0.000 0.003 3128 0.008 0.005 0.002
sed 316 0.005 0.001 0.003 3128 0.018 0.019 0.000
awk 316 0.003 0.001 0.003 3128 0.017 0.017 0.002
perl 316 0.008 0.002 0.004 3128 0.017 0.014 0.004
lgrp 316 0.058 0.042 0.021 3128 0.733 0.568 0.307
xargs 316 0.232 0.178 0.058 3128 2.249 1.730 0.555

There is same bench on my raspberry pi:

Method        count    real    user  system   count    real    user  system
paste 316 0.149 0.032 0.012 3128 0.204 0.014 0.054
pr 316 0.163 0.017 0.038 3128 0.418 0.069 0.096
sed 316 0.275 0.088 0.031 3128 1.586 0.697 0.045
awk 316 0.440 0.146 0.049 3128 2.809 1.305 0.050
perl 316 0.421 0.122 0.040 3128 2.042 0.902 0.067
lgrp 316 7.261 3.159 0.446 3128 71.733 31.223 3.558
xargs 316 9.464 3.038 1.066 3128 93.607 32.035 9.177

Hopefully all line count are same, then paste are clearly the quicker, followed by pr. Pure bash function is not slower than xargs (I'm surprised about poor performance of xargs!).

How do I merge two files so that equal lines do not repeat?

This will concatenate, then sort, then remove duplicate lines:

LC_ALL=C sort -u input1.txt input2.txt > output.txt

Bash to merge 2 consecutive lines in a file

Using sed:

sed '$!N;s/\n/,/' filename

Using paste:

paste -d, - - < filename

paste would leave a trailing , in case the input has an odd number of lines.

How to merge duplicate lines into same row with primary key and more than one column of information

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{
FS=","
OFS="|"
}
FNR==NR{
first=$1
$1=""
sub(/^,/,"")
arr[first]=(first in arr?arr[first] OFS:"")$0
next
}
($1 in arr){
print $1 arr[$1]
delete arr[$1]
}
' Input_file Input_file

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
BEGIN{ ##Starting BEGIN section of this program from here.
FS="," ##Setting FS as comma here.
OFS="|" ##Setting OFS as | here.
}
FNR==NR{ ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
first=$1 ##Setting first as 1st field here.
$1="" ##Nullifying first field here.
sub(/^,/,"") ##Substituting starting comma with NULL in current line.
arr[first]=(first in arr?arr[first] OFS:"")$0 ##Creating arr with index of first and keep adding same index value to it.
next ##next will skip all further statements from here.
}
($1 in arr){ ##Checking condition if 1st field is present in arr then do following.
print $1 arr[$1] ##Printing 1st field with arr value here.
delete arr[$1] ##Deleting arr item here.
}
' Input_file Input_file ##Mentioning Input_file names here.

How to concatenate multiple lines of output to one line?

Use tr '\n' ' ' to translate all newline characters to spaces:

$ grep pattern file | tr '\n' ' '

Note: grep reads files, cat concatenates files. Don't cat file | grep!

Edit:

tr can only handle single character translations. You could use awk to change the output record separator like:

$ grep pattern file | awk '{print}' ORS='" '

This would transform:

one
two
three

to:

one" two" three" 


Related Topics



Leave a reply



Submit