Erge Text Files Ordered by Numerical Filenames in Bash

erge text files ordered by numerical filenames in Bash

Adding this answer, only because the currently accepted answer suggests a bad practice. & In future, Hellmar may land in exact same problem I faced once. : Cannot delete an accepted answer.

Anyway, this should be the safe answer:

printf "%s\0" *txt | sort -zn | xargs -0 cat > all.txt

Here, entire pipeline has file names delimited by a NULL character. A NULL character is only character that cannot be part of file name.

Also, if all the filenames have same structure, (say file0001.txt, file0002.txt etc), then this code should work just as good:

cat file[0-9][0-9][0-9][0-9].txt > all.txt

BASH: loop files numbered by #{number} in sorted order

sort -k 1.2 -n should do the trick

-k F.C defines that input should be sorted according to field F, starting at character C. Both starting at 1

Edit: Just now I realize that my answer is pretty much the same answer to the question you linked. So definitely a duplicate

How to merge multiple files in order and append filename at the end in bash

Do not do cat $(....). You may just:

for ((i=1;i<38;i++)); do 
f="BOB_${i}.brother_bob12.txt"
sed "s/$/ $f/" "$f"
done

You may also do:

printf "%s\n" bob.txt BOB_{1..38}.brother_bob12.txt |
xargs -d'\n' -i sed 's/$/ {}/' '{}'

Concatenating 1.txt, 2.txt ... 10.txt into a single file

This can be done easily with brace expansion:

cat {1..10}.txt > output

Concatenating text files in bash

As much as I recommend against parsing the output of ls, here we go.

ls has a "version sort" option that will sort numbered files like you want. See below for a demo.

To concatenate, you want:

ls -v file*.txt | xargs cat > output
$ touch file{1..20}.txt
$ ls
file1.txt file12.txt file15.txt file18.txt file20.txt file5.txt file8.txt
file10.txt file13.txt file16.txt file19.txt file3.txt file6.txt file9.txt
file11.txt file14.txt file17.txt file2.txt file4.txt file7.txt
$ ls -1
file1.txt
file10.txt
file11.txt
file12.txt
file13.txt
file14.txt
file15.txt
file16.txt
file17.txt
file18.txt
file19.txt
file2.txt
file20.txt
file3.txt
file4.txt
file5.txt
file6.txt
file7.txt
file8.txt
file9.txt
$ ls -1v
file1.txt
file2.txt
file3.txt
file4.txt
file5.txt
file6.txt
file7.txt
file8.txt
file9.txt
file10.txt
file11.txt
file12.txt
file13.txt
file14.txt
file15.txt
file16.txt
file17.txt
file18.txt
file19.txt
file20.txt

Loop through filenames with numbers in alphanumeric order for OSX terminal

You are getting an alphabetical sort. From Bash Reference Manual # 3.5.8. Filename expansion:

3.5.8 Filename Expansion

After word splitting, unless the -f option has been set (see The Set
Builtin), Bash scans each word for the characters ‘*’, ‘?’, and ‘[’.
If one of these characters appears, then the word is regarded as a
pattern, and replaced with an alphabetically sorted list of filenames
matching the pattern.

To get a numerical sorting, use a while loop that gets fed by the result of find, which you can sort numerically:

while IFS= read -r file
do
echo "$file --"
done < <(find /your/path -maxdepth 1 -mindepth 1 -type f -printf "%f\n" | sort -n)

That is:

  • find /your/path -maxdepth 1 -mindepth 1 -type f

    Gets elements in /your/path that are a file without going through subdirectories.

  • printf "%f"

    Prints just the name of the file.

  • sort -n

    Sorts numerically

  • while ... do; ... done < <(command)

    Is a process substitution that injects its output in the while loop.

Concatenate files based on numeric sort of name substring in awk w/o header

You can also use tail to concatenate all the files without header

tail -q -n+2 chr*_smallfiles > bigfile

In case you want to concatenate the files in a natural sort order as described in your quesition, you can pipe the result of ls -v1 to xargs using

ls -v1 chr*_smallfiles | xargs -d $'\n' tail -q -n+2 > bigfile

(Thanks to Charles Duffy) xargs -d $'\n' sets the delimiter to a newline \n in case the filename contains white spaces or quote characters

Merge sorted files without knowing file names

I have found an answer!

find . -type f | awk '{print "<(gzip -cd "$0")"}' | tr "\n" " " | (echo -n sort -m " "; cat -; echo) | bash

This finds all the files in the directory, and send them as arguments to a sort command, replacing newlines with spaces along the way. Thanks for everyone's help in getting here!



Related Topics



Leave a reply



Submit