How to Sort with Multiple Lines in Bash

How to sort with multiple lines in bash?

Probably far from optimal, but

sed -r ':r;/(^|\n)$/!{$!{N;br}};s/\n/\v/g' names | sort | sed 's/\v/\n/g'

seems to do the job (names is the file with records). This allows records of arbitrary length, not just 2 lines.

Sorting multiple-line records in alphabetical order using shell

The quick way of coming up with something is:

$ cat file | awk 'BEGIN{RS=""; FS="\n"; OFS="|"}{$1=$1}1' \
| sort | awk 'BEGIN{FS="|";OFS="\n";ORS="\n\n"}{$1=$1}1'

Or you can write it in a single Gnu AWK,

$ awk 'BEGIN{RS=""; ORS="\n\n"; FS=OFS="\n"; PROCINFO["sorted_in"]="@val_str_asc"}
{a[NR]=$0}END{for(i in a) print a[i]}' file

If you don't want the last line to be empty, you can do the following:

$ cat file | awk 'BEGIN{RS=""; FS="\n"; OFS="|"}{$1=$1}1' \
| sort | awk 'BEGIN{FS="|";OFS="\n"}{$1=$1}1' | sed '$d'


$ awk 'BEGIN{RS=""; FS=OFS="\n"; PROCINFO["sorted_in"]="@val_str_asc"}
{a[NR]=$0}END{for(i in a) print a[i] (--NR?"\n":"")}' file

Sort array with multiple lines using another ordered array pattern in bash with awk

Could you please try following. I have changed solution a bit now. Why because it was not clear that you want to print ALL values of for example NC from array a so I have changed the logic now. Where it will keep concatenating values to itself for a string NC OR NV and when it checks it in array b or so then it will print all values of it(from array a).

awk -v OFS='\t' '
FNR==NR{
split($5,a,"_")
array[a[1]]=(array[a[1]]?array[a[1]] ORS $0:$0)
next
}
($1 in array) {
print array[$0]
delete array[$0]
}
END{
for(j in array){
if(array[j]){ print array[j] }
}
}' <(printf '%s\n' "${a[@]}") <(printf '%s\n' "${b[@]}")

How to sort data based on the value of a column for part (multiple lines) of a file?

Apply the DSU (Decorate/Sort/Undecorate) idiom using any awk+sort+cut and regardless of how many lines are in each bock:

$ awk -v OFS='\t' '
NF<pNF || NR==1 { blockNr++ }
{ print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }
' file |
sort -n -k1,1 -k2,2 -k4,4 -k3,3 |
cut -f5-
3
0
1 0.8
2 0.5
3 0.2
3
1
1 0.4
2 0.1
3 0.8
3
2
1 0.8
2 0.4
3 0.3

To understand what that's doing, just look at the first 2 steps:

$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr++ } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file
1 1 1 1 3
1 1 2 2 0
1 2 3 2 2 0.5
1 2 4 1 1 0.8
1 2 5 3 3 0.2
2 1 6 6 3
2 1 7 7 1
2 2 8 2 2 0.1
2 2 9 3 3 0.8
2 2 10 1 1 0.4
3 1 11 11 3
3 1 12 12 2
3 2 13 1 1 0.8
3 2 14 2 2 0.4
3 2 15 3 3 0.3


$ awk -v OFS='\t' 'NF<pNF || NR==1{ blockNr++ } { print blockNr, NF, NR, (NF>1 ? $1 : NR), $0; pNF=NF }' file |
sort -n -k1,1 -k2,2 -k4,4 -k3,3
1 1 1 1 3
1 1 2 2 0
1 2 4 1 1 0.8
1 2 3 2 2 0.5
1 2 5 3 3 0.2
2 1 6 6 3
2 1 7 7 1
2 2 10 1 1 0.4
2 2 8 2 2 0.1
2 2 9 3 3 0.8
3 1 11 11 3
3 1 12 12 2
3 2 13 1 1 0.8
3 2 14 2 2 0.4
3 2 15 3 3 0.3

and notice that the awk command is just creating the key values that you need for sort to sort on by block number, line number or $1, etc. So awk Decorates the input, sort Sorts it, and cut Undecorates it by removing the decoration values that the awk script added.

Bash - sort range of lines in file

With head, GNU sed and tail:

(head -n 1 test.sh; sed -n '2,${/\\/p}' test.sh | sort; tail -n 1 test.sh) > test_new.sh

Output:


g++ -o test.out \
Blub.cpp \
Framework.cpp \
Main.cpp \
Sample.cpp \
-std=c++14 -lboost

How to sort groups of lines?

Maybe not the fastest :) [1] but it will do what you want, I believe:

for line in $(grep -n '^\[.*\]$' sections.txt |
sort -k2 -t: |
cut -f1 -d:); do
tail -n +$line sections.txt | head -n 5
done

Here's a better one:

for pos in $(grep -b '^\[.*\]$' sections.txt |
sort -k2 -t: |
cut -f1 -d:); do
tail -c +$((pos+1)) sections.txt | head -n 5
done

[1] The first one is something like O(N^2) in the number of lines in the file, since it has to read all the way to the section for each section. The second one, which can seek immediately to the right character position, should be closer to O(N log N).

[2] This takes you at your word that there are always exactly five lines in each section (header plus four following), hence head -n 5. However, it would be really easy to replace that with something which read up to but not including the next line starting with a '[', in case that ever turns out to be necessary.


Preserving start and end requires a bit more work:

# Find all the sections
mapfile indices < <(grep -b '^\[.*\]$' sections.txt)
# Output the prefix
head -c+${indices[0]%%:*} sections.txt
# Output sections, as above
for pos in $(printf %s "${indices[@]}" |
sort -k2 -t: |
cut -f1 -d:); do
tail -c +$((pos+1)) sections.txt | head -n 5
done
# Output the suffix
tail -c+$((1+${indices[-1]%%:*})) sections.txt | tail -n+6

You might want to make a function out of that, or a script file, changing sections.txt to $1 throughout.



Related Topics



Leave a reply



Submit