How to sort files in paste command with 500 files csv
You can do it with gawk
like this...
Simply read all the files in, one after the other and save them into an array. The array is indexed by two numbers, firstly the line number in the current file (FNR
) and secondly the column, which I increment each time we encounter a new file in the BEGINFILE
block.
Then, at the end, print out the entire array:
gawk 'BEGINFILE{ ++col } # New file, increment column number
{ X[FNR SEP col]=$0; rows=FNR } # Save datum into array X, indexed by current record number and col
END { for(r=1;r<=rows;r++){
comma=","
for(c=1;c<=col;c++){
if(c==col)comma=""
printf("%s%s",X[r SEP c],comma)
}
printf("\n")
}
}' chirps*
SEP
is just an unused character that makes a separator between indices. I am using gawk
because BEGINFILE
is useful for incrementing the column number.
Save the above in your HOME directory as merge
. Then start a Terminal and, just once, make it executable with the command:
chmod +x merge
Now change directory to where your chirps are with a command like:
cd subdirectory/where/chirps/are
Now you can run the script with:
$HOME/merge
The output will rush past on the screen. If you want it in a file, use:
$HOME/merge > merged.csv
paste command with ascending order of name of the files
As per Jonathan Leffler's comment ls -v
would be the simplest if it is supported.
If your ls
supports -v
option.
ls -v atom* | xargs paste > data
If not, sort could be used.
find . -name 'atom*' | sort -n -k1.7 | xargs paste > data
The 7 arises from ./atomNNNN, so skipping the leading 6 characters. If you have a different prefix (instead of "atom") update the -k1.7 to reflect it.
Without sort
$ find . -name 'atom*'
./atom
./atom0
./atom1
./atom10
./atom11
./atom12
./atom3
./atom9
With sort
$ find . -name 'atom*' | sort -n -k1.7
./atom0
./atom1
./atom3
./atom9
./atom10
./atom11
./atom12
Pasting many files together in some sort of loop, paste command
Instead of running the command in the loop, you could concatenate the strings in the loop and then run the command at the end.
STR=""
for i in (1..101)
do
STR=$STR"file"$i" "
done
paste $STR
CMD: Sort files in folder structure
Based on Stephan's comment and some additional code cleanup and best practice usage your code should look like this.
:sort
SET "input=C:\Daten\Input"
SET "target=C:\Daten\Target"
for /r "%input%" %%a in (*.zip) do (
For /f "tokens=1 delims=_ " %%t in ("%%~na") do (
copy /Y "%%~a" "%target%\%%t\"
)
)
Shell command - tr,sort,paste,nl
You can use pr
for arranging the columns. No intermediate files are needed:
tr [:upper:] [:lower:] < "${FILE}" \
| tr -d [:digit:] \
| sort \
| pr -t3 \
| nl
or in one line:
tr [:upper:] [:lower:] < "${FILE}" | tr -d [:digit:] | sort | pr -t3 | nl
See: https://linux.die.net/man/1/pr
Paste side by side multiple files by numerical order
If your current shell is bash: paste -d " " file{1..1000}
Script Getting Stuck While Sorting Files Using Grep
xargs
is probably the culprit; you should add the --no-run-if-empty
(aka -r
) option and specify the delimiter to be \0
(in combination with pdfgrep -lZ
):
#!/bin/bash
keywords=(
"Keyword1"
"Keyword2"
"Keyword3"
)
for kw in "${keywords[@]}"
do
printf 'Matching keyword: %q\n' "$kw"
folder="$HOME"/Sorted/"$kw"
mkdir -p "$folder" || exit 1
pdfgrep -irlZ "$kw" "$HOME"/PDFs/ | xargs -0 -r cp -t "$folder/"
done
echo "Unmatched keywords:"
find "$HOME"/Sorted/ -mindepth 1 -maxdepth 1 -type d -empty -delete -printf "\t%P\n"
Aside: You could create symbolic or even hard links to the PDF (with ... | xargs -0 -r ln -s -t "$folder/"
) instead of copying them; that'll be faster and save disk space.
pasting file side by side
Would you please try the following:
paste $(ls data* | sort -t_ -k3n) | awk -F'\t' -v OFS='\t' '
{for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"} 1'
Output:
1.5 2.3 1.2
2.0 1.8 2.3
0.0 0.0 4.5
sort -t_ -k3n
sets the field separator to_
and numerically sorts
the filenames on the 3rd field values.- The options
-F'\t' -v OFS='\t'
to the awk command assign
input/output field separator to a tab character. - The awk statement
for (i=1; i<=NF; i++) if ($i == "") $i = "0.0"
scans the input fields and sets0.0
for the empty fields. - The final
1
is equivalent toprint $0
to print the fields.
[Edit]
If you have huge number of files, it may exceed the capability of bash. Here is an alternative with python
using dataframe
.
#!/usr/bin/python
import glob
import pandas as pd
import re
files = glob.glob('data*')
files.sort(key=lambda x: int(re.sub(r'.*_', '', x))) # sort filenames numerically by its number
dfs = [] # list of dataframes
for f in files:
df = pd.read_csv(f, header=None, names=[f]) # read file and assign column
df = df.apply(pd.to_numeric, errors='coerce') # force the cell values to floats
dfs.append(df) # add as a new column
df = pd.concat(dfs, axis=1, join='outer') # create a dataframe from the list of dataframes
df = df.fillna(0) # fill empty cells
print(df.to_string(index=False, header=False)) # print the dataframe removing index and header
which will produce the same results.
Related Topics
How to Continously Run a Unix Script in Background Without Using Crontab.
Git Status Between Windows and Linux Does Not Agree
In Shellscript Assign Variable Based on Curl Output
How to Make Webdriver Testsuite Created in Windows Machine to Run in a Linux Box
What Is The Right Place for Findxxx.Cmake Files for Locally Compiled Libs
Auto-Start Program at Login in Angstrom on Beagleboard
How to Switch Between Different Versions of Julia (Specifically Between V0.3 and V0.4 on Ubuntu)
How to Get a Linux Coredump That Only Contains Callstack, Threads, and Local Variables
Does Zgrep Unzip a File Before Searching
Inconsistent Systemd Startup of Freeswitch
How to Install Cross Compiled Cups to Target Board
Arm Linux ":Start_Kernel Is Not Calling After Decompressing UImage"
Creating User Passwords from an Ansible Playbook