Argument List Too Long When Concatenating Lots of Files in a Folder

Argument list too long when concatenating lots of files in a folder

Your full code is:

rm -f /tmp/temp.files
ls -1 /var/log/processing/*.log | xargs -n1 basename > /tmp/temp.files
cat /tmp/temp.files | sed -r "s~(.*)-[0-9]{4}(-[0-9]{2})+\.log~cat /var/log/processing/\1* >> /var/log/processing/\1$(date +"-%Y-%m-%d-%H-%M").log~" | uniq | sh
cd /var/log/processing
xargs rm -rf < /tmp/temp.files
rm -f /tmp/temp.files

But the problem lies on the ls -1 /var/log/processing/*.log part, so I am skipping the rest.

The expansion done by /var/log/processing/*.log gives so many results that ls itself cannot handle all of them and hence prints the "Argument list too long" message.

You can use a find statement like this:

find /var/log/processing -name "*.log" -exec basename {} \; > /tmp/temp.files

See I am not using ls parsing (read interesting Why you shouldn't parse the output of ls).

(Argument list too long) While opening a large list of files using *cat*

The Argument list too long error is documented in errno(3) (as E2BIG) and related to some execve(2) system call done by your GNU bash shell. Use sysconf(3) with ARG_MAX to query that limit.

You have several approaches:

  • recompile your Linux kernel to raise that limit.
  • write some small C program using appropriate syscalls(2) more appropriately, or write some Python script, or some GNU guile script, ... doing the same
  • increase some limits, but using setrlimit(2) appropriately (perhaps using the shell ulimit builtin).

See also the documentation and the source code of GNU bash

Awk argument too long when merging csv files

With the printf and xargs parts, you are sending the contents of the csv files into awk, but you also provide the filenames to awk. Pick one or the other: I'd suggest:

{ printf '%s\n' *.csv | xargs awk 'FNR==1 && NR!=1{next;}{print}'; } > master.csv

Merging large number of files into one

Here's a safe way to do it, without the need for find:

 printf '%s\0' *.n3 | xargs -0 cat > merged.txt

(I've also chosen merged.txt as the output file, as @MichaelDautermann soundly advises; rename to merged.n3 afterward).

Note: The reason this works is:

  • printf is a bash shell builtin, whose command line is not subject to the length limitation of command lines passed to external executables.
  • xargs is smart about partitioning the input arguments (passed via a pipe and thus also not subject to the command-line length limit) into multiple invocations so as to avoid the length limit; in other words: xargs makes as few calls as possible without running into the limit.
  • Using \0 as the delimiter paired with xargs' -0 option ensures that all filenames - even those with, e.g., embedded spaces or even newlines - are passed through as-is.

How to concatenate huge number of files

If your directory structure is shallow (there are no subdirectories) then you can simply do:

find . -type f -exec cat {} \; > newFile

If you have subdirectories, you can limit the find to the top level, or you might consider putting some of the files in the sub-directories so you don't have this problem!

This is not particularly efficient, and some versions of find allow you to do:

find . -type f -exec cat {} \+ > newFile

for greater efficiency. (Note the backslash before the + is not necessary, but I find it nice for symmetry with the previous example.)

UNIX: cat large number of files - output being doubled

Here's how I would do it

find . -name File.txt -exec cat {} >> output.txt \;

This searches for all occurrences of the file File.txt and appends the cat'ed output of that file to the file output.txt

However, I have tried your find command and it too also works.

find . -name File.txt -exec cat {} \; > newFile.txt

I would suggest that you clear down the output file newFile.txt before you try either your find or my find as follows:

 >newFile.txt 

This is a handy way to empty a file's contents. (Although this should not matter to you right now emptying a file by redirecting nothing to it can be done even if another process is writing to the file)

Hope this helps.

Combine list of text files (too long), adding newline separator in between

To avoid the long command line, you can use a shell construct such as a for loop:

for f in dir/*; do cat "$f"; printf '\n'; done > combined.txt

If the order of files in the combined file doesn't matter, you can use find instead:

find dir -type f -exec sed -s '$s/$/\n/' {} + > combined.txt

This uses find -exec to minimize the number of times the command in -exec is called, while avoiding command lines that are too long.

sed -s '$s/$/\n' replaces the end of the last line in a file with a newline; -s makes sure that the change is applied to every file when multiple are supplied as arguments.



Related Topics



Leave a reply



Submit