Linux: Merging Multiple Files, Each on a New Line

Linux: Merging multiple files, each on a new line

just use awk

awk 'FNR==1{print ""}1' *.txt

Concatenating Files And Insert New Line In Between Files

You can do:

for f in *.txt; do (cat "${f}"; echo) >> finalfile.txt; done

Make sure the file finalfile.txt does not exist before you run the above command.

If you are allowed to use awk you can do:

awk 'FNR==1{print ""}1' *.txt > finalfile.txt

Bash: concatenate multiple files and add \newline between each?

If you want the literal string "\newline", try this:

for f in *.md; do cat "$f"; echo "\newline"; done > output.md

This assumes that output.md doesn't already exist. If it does (and you want to include its contents in the final output) you could do:

for f in *.md; do cat "$f"; echo "\newline"; done > out && mv out output.md

This prevents the error cat: output.md: input file is output file.

If you want to overwrite it, you should just rm it before you start.

How to merge two files line by line in Bash

You can use paste:

paste file1.txt file2.txt > fileresults.txt

How to append contents of multiple files into one file

You need the cat (short for concatenate) command, with shell redirection (>) into your output file

cat 1.txt 2.txt 3.txt > 0.txt

Using cat command start each file with new line

Quick and dirty, add newlines before each METADATA:

cat F_Worker_TEMP_VO.dat ....dat | sed 's/METADATA/\nMETADATA/g' > Worker.dat

Dirty, add newlines before each METADATA except first:

cat F_Worker_TEMP_VO.dat ....dat | sed 's/\(.\)METADATA/\1\nMETADATA/g' > Worker.dat

Loop:

for file in F_Worker_TEMP_VO.dat ... F_PERSON_NATIONALIDE_SSN_TEMP_VO.dat; do
cat "${file}"
echo
done > Worker.dat

Concatenate text files, separating them with a new line

A simple

sort -u *.db > uniquified # adjust glob as needed

should do it; sort will interpose newlines between files should it be necessary.

cat *.db | sort -u

is a classic UUoC and the glitch with files lacking trailing newlines is not the only issue.

Having said that, 25GB probably won't fit in your RAM, so sort will end up creating temporary files anyway. It might turn out to be faster to sort the files in four or five groups, and then merge the results. That could take better advantage of the large number of duplicates. But I'd only experiment if the simple command really takes an exorbitant amount of time.

Even so, sorting the files individually is probably even slower; usually the best bet is to max out your memory resources for each invocation of sort. You could, for example, use xargs with the -n option to split the filelist into groups of a couple of dozen files each. Once you have each group sorted, you could use sort -m to merge the sorted temporaries.

A couple of notes on how to improve sorting speed:

  1. Use LC_COLLATE=C sort if you don't need locale-aware sorting of alphabetic data. That typically speeds sort up by a factor of three or four.

  2. Avoid using RAM disks for temporary space. (On many Linux distros, /tmp is a RAM disk.) Since sort uses temporary disks when it runs out of RAM, putting the temporary in a RAMdisk is counterproductive. For the same reason, don't put your own temporary output files in /tmp. /var/tmp should be real disk; even better, if possible, use a second disk drive (not a slow USB drive, of course).

  3. Avoid slugging your machine down with excessive swapping while you're doing the sort, by turning swap off:

    sudo swapoff -a

    You can turn it back on afterwards, although I personally run my machine like this all the time because it avoids diving into complete unresponsiveness under memory pressure.

  4. The ideal is to adjust -S so that sort uses as much memory as you can spare, and avoid the use of internal temporaries by sorting in chunks which fit into that amount of memory. (Merging the sorted chunks is a lot faster than sorting, and it reads and writes sequentially without needing extra disk space.) You'll probably need to do some experimentation to find a good chunk size.

Merging two text files into new one (back and forth every new line) using C in Linux using system-calls

you have to retain if you have read all your file or not, because the read in the first while will ... read, and that's not what you want.

Code edited after comment :

#include <stdlib.h>
#include <stdio.h>
#include <unistd.h>
#include <stdbool.h>
#include <fcntl.h>
#include <string.h>
#include <errno.h>

bool WriteLineFromFile(int dst, int src, bool *srcTerminated)
{
int lastChar = EOF;
char currentChar;
ssize_t nbCharRead;
ssize_t nbCharWrite;

do {
if ((nbCharRead = read(src, ¤tChar, 1)) < 0) {
fprintf(stderr, "%s : read(src, &buf, 1) : src=%d, errno='%s'.\n", __func__, src, strerror(errno));
return (false);
}
// End of file
if (nbCharRead == 0) {
(*srcTerminated) = true;
// Adding '\n' if necessary
if (lastChar != '\n' && lastChar != EOF) {
currentChar = '\n';
while ((nbCharWrite = write(dst, ¤tChar, 1)) != 1) {
if (nbCharWrite < 0) {
fprintf(stderr, "%s : write(dst, &buf, 1) : dst=%d, errno='%s'.\n", __func__, dst, strerror(errno));
return (false);
}
sleep(1);
}
}
return (true);
}
// Writing a char into the dst file
while ((nbCharWrite = write(dst, ¤tChar, 1)) != 1) {
if (nbCharWrite < 0) {
fprintf(stderr, "%s : write(dst, &buf, 1) : dst=%d, errno='%s'.\n", __func__, dst, strerror(errno));
return (false);
}
sleep(1);
}
lastChar = currentChar;
} while (currentChar != '\n');

return (true);
}

bool FileMerging(char *inputPathFile1, char *inputPathFile2, char *outputPathFile)
{
int inputFile1 = -1;
bool file1Terminated = false;
int inputFile2 = -1;
bool file2Terminated = false;
int outputFile = -1;
bool returnFunction = false;

// Openning all the file descriptor
if ((inputFile1 = open(inputPathFile1, O_RDONLY)) == -1) {
fprintf(stderr, "%s : open(inputPathFile1, O_RDONLY) : inputPathFile1='%s', errno='%s'.\n", __func__, inputPathFile1, strerror(errno));
goto END_FUNCTION;
}
if ((inputFile2 = open(inputPathFile2, O_RDONLY)) == -1) {
fprintf(stderr, "%s : open(inputPathFile2, O_RDONLY) : inputPathFile2='%s', errno='%s'.\n", __func__, inputPathFile2, strerror(errno));
goto END_FUNCTION;
}
if ((outputFile = open(outputPathFile, O_WRONLY | O_CREAT, 0644)) == -1) {
fprintf(stderr, "%s : open(outputPathFile, O_RDONLY) : outputPathFile='%s', errno='%s'.\n", __func__, outputPathFile, strerror(errno));
goto END_FUNCTION;
}

// Alternativly print a line from inputFile1 and inputFile2 to outputFile
do {
if (!file1Terminated) {
if (!WriteLineFromFile(outputFile, inputFile1, &file1Terminated)) {
goto END_FUNCTION;
}
}
if (!file2Terminated) {
if (!WriteLineFromFile(outputFile, inputFile2, &file2Terminated)) {
goto END_FUNCTION;
}
}
} while (!file1Terminated || !file2Terminated);

returnFunction = true;
/* GOTO */END_FUNCTION:
if (inputFile1 != -1) {
close(inputFile1);
}
if (inputFile2 != -1) {
close(inputFile2);
}
if (outputFile != -1) {
close(outputFile);
}
return (returnFunction);
}

int main(int argc, char *argv[])
{
if (argc != 4) {
fprintf(stderr, "This program wait 3 arguments on the command-line : inputFilePath1 inputPathFile2 outputPathFile.\n");
return (EXIT_FAILURE);
}
if (!FileMerging(argv[1], argv[2], argv[3])) {
return (EXIT_FAILURE);
}
return (EXIT_SUCCESS);
}


Related Topics



Leave a reply



Submit