How to Merge Similar Lines in Linux

How to merge every two lines into one from the command line?

awk:

awk 'NR%2{printf "%s ",$0;next;}1' yourFile

note, there is an empty line at the end of output.

sed:

sed 'N;s/\n/ /' yourFile

how to merge similar lines in linux

Here is one way to do it:

awk ' {
  last=$NF; $NF=""
  if($0==previous) {
    tail=tail " " last
  }
  else {
    if(previous!="") {
      if(split(tail,foo)==1) tail=tail " 0"
      print previous tail
    }
    previous=$0
    tail=last
  }
}
END {
    if(previous!="") print previous tail
}
'

How to merge two files line by line in Bash

You can use paste:

paste file1.txt file2.txt > fileresults.txt

Merge 16 lines into one line

Benchmarking six different merging methods,

for merging specific number of lines.

Basicaly, there are many commands:

`pr` - convert text files for printing

pr -at16 <file

Try:

pr -a -t -16 < <(seq 1 42)
1   2   3   4   5   6   7   8   9   10  11  12  13  14  15  16
17  18  19  20  21  22  23  24  25  26  27  28  29  30  31  32
33  34  35  36  37  38  39  40  41  42

`xargs` - build and execute command lines from standard input

... and executes the command (default is /bin/echo) ...

xargs -n 16 <file

Try:

xargs -n 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

`paste` - merge lines of files

printf -v pasteargs %*s 16
paste -d\  ${pasteargs// /- } < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

`sed` - stream editor for filtering and transforming text

printf -v sedstr 'N;s/\\n/ /;%.0s' {2..16};
sed -e "$sedstr" < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

`awk` - pattern scanning and processing language

awk 'NR%16{printf "%s ",$0;next;}1'  < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

But, you could use pure bash:

group=()
while read -r line;do
    group+=("$line")
    (( ${#group[@]} > 15 ))&&{
        echo "${group[*]}"
        group=()
    }
  done < <(seq 1 42) ; echo "${group[*]}"
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

or as a function:

lgrp () { 
    local group=() line
    while read -r line; do
        group+=("$line")
        ((${#group[@]}>=$1)) && { 
            echo "${group[*]}"
            group=()
        }
    done
    [ "$group" ] && echo "${group[*]}"
}

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=$1))&&{
          echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}

then

lgrp 16 < <(seq 1 42)
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16
17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32
33 34 35 36 37 38 39 40 41 42

( Note: All this tests was arbitrarily done over 42 values, don't ask my why! ;-)

Other languages

Of course, by using any language, you could do same:

perl -ne 'chomp;$r.=$_." ";( 15 < ++$cnt) && do {
    printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
  };END{print $r."\n"}' < <(seq 1 42)

Like python, ruby, lisp, C, ...

Comparison of execution time.

Ok, there are more than 3 simple ways, let do a little bench.
How I do it:

lgrp () { local g=() l;while read -r l;do g+=("$l");((${#g[@]}>=
 $1))&&{ echo "${g[*]}";g=();};done;[ "$g" ] && echo "${g[*]}";}
export -f lgrp
printf -v sedcmd '%*s' 15
sedcmd=${sedcmd// /N;s/\\n/ /;}
export sedcmd
{ 
    printf "%-12s\n" Method
    printf %7s\\n count real user system count real user system

    for cmd in 'paste -d " " -{,,,}{,,,}' 'pr -at16' \
        'sed -e "$sedcmd"' \
        $'awk \47NR%16{printf "%s ",$0;next;}1;END{print ""}\47'\
        $'perl -ne \47chomp;$r.=$_." ";( 15 < ++$cnt) && do {
           printf "%s\n", $1 if $r =~ /^(.*) $/;$r="";$cnt=0;
           };END{print $r."\n"}\47' 'lgrp 16' 'xargs -n 16'
    do
        printf %-12s\\n ${cmd%% *}
        for length in 5042 50042; do
            printf %7s\\n $(bash -c "TIMEFORMAT=$'%R %U %S';
                time $cmd < <(seq 1 $length) | wc -l" 2>&1)
        done
    done
} | paste -d $'\t' -{,,,,,,,,}

(This could be cut'n paste in a bash terminal). Produce, on my computer:

Method        count    real    user  system   count    real    user  system
paste           316   0.002   0.002   0.000    3128   0.003   0.003   0.000
pr              316   0.003   0.000   0.003    3128   0.008   0.005   0.002
sed             316   0.005   0.001   0.003    3128   0.018   0.019   0.000
awk             316   0.003   0.001   0.003    3128   0.017   0.017   0.002
perl            316   0.008   0.002   0.004    3128   0.017   0.014   0.004
lgrp            316   0.058   0.042   0.021    3128   0.733   0.568   0.307
xargs           316   0.232   0.178   0.058    3128   2.249   1.730   0.555

There is same bench on my raspberry pi:

Method        count    real    user  system   count    real    user  system
paste           316   0.149   0.032   0.012    3128   0.204   0.014   0.054
pr              316   0.163   0.017   0.038    3128   0.418   0.069   0.096
sed             316   0.275   0.088   0.031    3128   1.586   0.697   0.045
awk             316   0.440   0.146   0.049    3128   2.809   1.305   0.050
perl            316   0.421   0.122   0.040    3128   2.042   0.902   0.067
lgrp            316   7.261   3.159   0.446    3128  71.733  31.223   3.558
xargs           316   9.464   3.038   1.066    3128  93.607  32.035   9.177

Hopefully all line count are same, then paste are clearly the quicker, followed by pr. Pure bash function is not slower than xargs (I'm surprised about poor performance of xargs!).

How do I merge two files so that equal lines do not repeat?

This will concatenate, then sort, then remove duplicate lines:

LC_ALL=C sort -u input1.txt input2.txt > output.txt

Bash to merge 2 consecutive lines in a file

Using sed:

sed '$!N;s/\n/,/' filename

Using paste:

paste -d, - - < filename

paste would leave a trailing , in case the input has an odd number of lines.

How to merge duplicate lines into same row with primary key and more than one column of information

Could you please try following. Written and tested with shown samples in GNU awk.

awk '
BEGIN{
  FS=","
  OFS="|"
}
FNR==NR{
  first=$1
  $1=""
  sub(/^,/,"")
  arr[first]=(first in arr?arr[first] OFS:"")$0
  next
}
($1 in arr){
  print $1 arr[$1]
  delete arr[$1]
}
' Input_file  Input_file

Explanation: Adding detailed explanation for above.

awk '                       ##Starting awk program from here.
BEGIN{                      ##Starting BEGIN section of this program from here.
  FS=","                    ##Setting FS as comma here.
  OFS="|"                   ##Setting OFS as | here.
}
FNR==NR{                    ##Checking FNR==NR which will be TRUE when first time Input_file is being read.
  first=$1                  ##Setting first as 1st field here.
  $1=""                     ##Nullifying first field here.
  sub(/^,/,"")              ##Substituting starting comma with NULL in current line.
  arr[first]=(first in arr?arr[first] OFS:"")$0  ##Creating arr with index of first and keep adding same index value to it.
  next                      ##next will skip all further statements from here.
}
($1 in arr){                ##Checking condition if 1st field is present in arr then do following.
  print $1 arr[$1]          ##Printing 1st field with arr value here.
  delete arr[$1]            ##Deleting arr item here.
}
' Input_file  Input_file    ##Mentioning Input_file names here.

How to concatenate multiple lines of output to one line?

Use tr '\n' ' ' to translate all newline characters to spaces:

$ grep pattern file | tr '\n' ' '

Note: grep reads files, cat concatenates files. Don't cat file | grep!

Edit:

tr can only handle single character translations. You could use awk to change the output record separator like:

$ grep pattern file | awk '{print}' ORS='" '

This would transform:

one
two 
three

to:

one" two" three"

How to Merge Similar Lines in Linux

How to merge every two lines into one from the command line?

how to merge similar lines in linux

How to merge two files line by line in Bash

Merge 16 lines into one line

Benchmarking six different merging methods,

`pr` - convert text files for printing

`xargs` - build and execute command lines from standard input

`paste` - merge lines of files

`sed` - stream editor for filtering and transforming text

`awk` - pattern scanning and processing language

But, you could use pure bash:

Other languages

Comparison of execution time.

How do I merge two files so that equal lines do not repeat?

Bash to merge 2 consecutive lines in a file

How to merge duplicate lines into same row with primary key and more than one column of information

How to concatenate multiple lines of output to one line?

Related Topics

Leave a reply

How to merge every two lines into one from the command line?

how to merge similar lines in linux

How to merge two files line by line in Bash

Merge 16 lines into one line

Benchmarking six different merging methods,

pr - convert text files for printing

xargs - build and execute command lines from standard input

paste - merge lines of files

sed - stream editor for filtering and transforming text

awk - pattern scanning and processing language

But, you could use pure bash:

Other languages

Comparison of execution time.

How do I merge two files so that equal lines do not repeat?

Bash to merge 2 consecutive lines in a file

How to merge duplicate lines into same row with primary key and more than one column of information

How to concatenate multiple lines of output to one line?

Related Topics

Leave a reply

`pr` - convert text files for printing

`xargs` - build and execute command lines from standard input

`paste` - merge lines of files

`sed` - stream editor for filtering and transforming text

`awk` - pattern scanning and processing language