How to Sort Files in Paste Command

How to sort files in paste command with 500 files csv

You can do it with gawk like this...

Simply read all the files in, one after the other and save them into an array. The array is indexed by two numbers, firstly the line number in the current file (FNR) and secondly the column, which I increment each time we encounter a new file in the BEGINFILE block.

Then, at the end, print out the entire array:

gawk 'BEGINFILE{ ++col }                        # New file, increment column number
{ X[FNR SEP col]=$0; rows=FNR } # Save datum into array X, indexed by current record number and col
END { for(r=1;r<=rows;r++){
comma=","
for(c=1;c<=col;c++){
if(c==col)comma=""
printf("%s%s",X[r SEP c],comma)
}
printf("\n")
}
}' chirps*

SEP is just an unused character that makes a separator between indices. I am using gawk because BEGINFILE is useful for incrementing the column number.


Save the above in your HOME directory as merge. Then start a Terminal and, just once, make it executable with the command:

chmod +x merge

Now change directory to where your chirps are with a command like:

cd subdirectory/where/chirps/are

Now you can run the script with:

$HOME/merge

The output will rush past on the screen. If you want it in a file, use:

$HOME/merge > merged.csv

paste command with ascending order of name of the files

As per Jonathan Leffler's comment ls -v would be the simplest if it is supported.

If your ls supports -v option.

ls -v atom* | xargs paste > data

If not, sort could be used.

find . -name 'atom*' | sort -n -k1.7 | xargs paste > data

The 7 arises from ./atomNNNN, so skipping the leading 6 characters. If you have a different prefix (instead of "atom") update the -k1.7 to reflect it.

Without sort

$ find . -name 'atom*'
./atom
./atom0
./atom1
./atom10
./atom11
./atom12
./atom3
./atom9

With sort

$ find . -name 'atom*' | sort -n -k1.7
./atom0
./atom1
./atom3
./atom9
./atom10
./atom11
./atom12

Pasting many files together in some sort of loop, paste command

Instead of running the command in the loop, you could concatenate the strings in the loop and then run the command at the end.

STR=""
for i in (1..101)
do
STR=$STR"file"$i" "
done
paste $STR

CMD: Sort files in folder structure

Based on Stephan's comment and some additional code cleanup and best practice usage your code should look like this.

:sort
SET "input=C:\Daten\Input"
SET "target=C:\Daten\Target"

for /r "%input%" %%a in (*.zip) do (
For /f "tokens=1 delims=_ " %%t in ("%%~na") do (
copy /Y "%%~a" "%target%\%%t\"
)
)

Shell command - tr,sort,paste,nl

You can use pr for arranging the columns. No intermediate files are needed:

tr [:upper:] [:lower:] < "${FILE}" \
| tr -d [:digit:] \
| sort \
| pr -t3 \
| nl

or in one line:

tr [:upper:] [:lower:] < "${FILE}" | tr -d [:digit:] | sort | pr -t3 | nl

See: https://linux.die.net/man/1/pr

Paste side by side multiple files by numerical order

If your current shell is bash: paste -d " " file{1..1000}

Script Getting Stuck While Sorting Files Using Grep

xargs is probably the culprit; you should add the --no-run-if-empty (aka -r) option and specify the delimiter to be \0 (in combination with pdfgrep -lZ):

#!/bin/bash

keywords=(
"Keyword1"
"Keyword2"
"Keyword3"
)

for kw in "${keywords[@]}"
do
printf 'Matching keyword: %q\n' "$kw"
folder="$HOME"/Sorted/"$kw"
mkdir -p "$folder" || exit 1
pdfgrep -irlZ "$kw" "$HOME"/PDFs/ | xargs -0 -r cp -t "$folder/"
done

echo "Unmatched keywords:"
find "$HOME"/Sorted/ -mindepth 1 -maxdepth 1 -type d -empty -delete -printf "\t%P\n"

Aside: You could create symbolic or even hard links to the PDF (with ... | xargs -0 -r ln -s -t "$folder/") instead of copying them; that'll be faster and save disk space.

pasting file side by side

Would you please try the following:

paste $(ls data* | sort -t_ -k3n) | awk -F'\t' -v OFS='\t' '
{for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"} 1'

Output:

1.5     2.3     1.2
2.0 1.8 2.3
0.0 0.0 4.5
  • sort -t_ -k3n sets the field separator to _ and numerically sorts
    the filenames on the 3rd field values.
  • The options -F'\t' -v OFS='\t' to the awk command assign
    input/output field separator to a tab character.
  • The awk statement for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"
    scans the input fields and sets 0.0 for the empty fields.
  • The final 1 is equivalent to print $0 to print the fields.

[Edit]

If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python using dataframe.

#!/usr/bin/python

import glob
import pandas as pd
import re

files = glob.glob('data*')
files.sort(key=lambda x: int(re.sub(r'.*_', '', x))) # sort filenames numerically by its number

dfs = [] # list of dataframes
for f in files:
df = pd.read_csv(f, header=None, names=[f]) # read file and assign column
df = df.apply(pd.to_numeric, errors='coerce') # force the cell values to floats
dfs.append(df) # add as a new column
df = pd.concat(dfs, axis=1, join='outer') # create a dataframe from the list of dataframes
df = df.fillna(0) # fill empty cells
print(df.to_string(index=False, header=False)) # print the dataframe removing index and header

which will produce the same results.



Related Topics



Leave a reply



Submit