Terminal: extract same columns data and output into multiple files
EDIT: As per OP needed output file as Output_file00 etc format so following may help you. Also if you have only 30 files to read then you could remove this close(prev);prev=FILENAME
part, this will save you from too many files opened
error in case you are reading n number of files by this code too.
awk 'FNR==1{close(out); out=sprintf("Output_%02d",++i)} {print $1, $2, $3, $4 > out}' File_*.txt
Since you haven't provided samples so couldn't test it, could you please try following and let me know if this helps you.
awk '{print $1,$2,$3,$4 > ("Output_file"i)}' File_*.txt
Above should create Output_file(s) eg--> Output_file1
, Output_file2
and so on.
Extracting same column from each file to one file
Here is a very quick and dirty way of doing it:
Assuming your files are in the same order:
$ awk '(FNR==NR){a[FNR]=$0;next}
{a[FNR]=a[FNR] FS $NF}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
if you want the header a bit cleaner:
$ awk '(FNR==NR){a[FNR]=$0 (FNR==1?++c:"");next}
{a[FNR]=a[FNR] FS $NF (FNR==1?++c:"")}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
Assuming your files are not in the same order:
$ awk '{key=$1 FS $2}
(FNR==NR){a[key]=$0 (FNR==1?++c:"");next}
{a[key]=a[key] FS $NF (FNR==1?++c:"")}
END{for(i in a) print a[i]}' file1 file2 file3 ... filen
copy set of every 3 colums to a new file in unix
$ awk '{for (i=1;i<=NF;i+=3) {print $i,$(i+1),$(i+2) > ("output" ((i+2)/3) ".txt")}}' file.txt
# output1.txt
ID Name Place
19 john NY
20 Jen NY
# output2.txt
ID Name Place
23 Key NY
22 Jill NY
# output3.txt
ID Name Place
22 Tom Ny
22 Ki LA
# output4.txt
ID Name place
24 Jeff NY
34 Jack Roh
Skipping every nth row and copy/transform data into another matrix
pd.read_csv is a function that returns a dataframe.
To select specific rows from a dataframe you can use this function :
df.loc[start:stop:step]
so it would look something like this :
df = pd.read_csv(your_file)
new_df = df.loc[11:140]
#transform it as you please
#convert it to excel or csv
new_df .to_excel("new_file.xlsx") or new_df .to_csv("new_file.csv")
Split huge file into n files keeping first 7 columns + next 3 columns until column n
Here are a couple of solutions with bash
tools.
1. bash
Using cut
inside a bash
loop.This will raise n
processes and parse the file n
times.
Update for the case we don't have just a sequence of letters as _ids in column names, but many string ids, repeating the same every 3 lines after the first 7 lines. We have to first read the header of the file and extract them, e.g. a quick solution is to use awk
and print them every 8th, 11th, etc column into the bash array.
#!/bin/bash
first=7
#ids=( {a..n} )
ids=( $( head -1 "$1" | awk -F"_" -v RS="," -v f="$first" 'NR>f && (NR+1)%3==0{print $2}' ) )
for i in "${!ids[@]}"; do
cols="1-$first,$((first+1+3*i)),$((first+2+3*i)),$((first+3+3*i))"
cut -d, -f"$cols" "$1" > "${ids[i]}.txt"
done
Usage: bash test.sh file
2. awk
Or you can use awk
. Here I customize just the number of outputs, but the others can also be done like in the first solution.
BEGIN { FS=OFS=","; times=14 }
{
for (i=1;i<=times;i++) {
print $1,$2,$3,$4,$5,$6,$7,$(5+3*i),$(6+3*i),$(7+3*i) > sprintf("%c.txt",i+96)
}
}
Usage: awk -f test.awk file
.
This solution should be fast, as it parses the file once. But it shouldn't be used like this, for a large number of output files, it could throw a "too many files open" error. For the range of the letters, it should be ok.
Related Topics
How to Disable Qt's Behavior on Linux of Capturing Arrow Keys for Widget Focus Navigation
Is There Any General Interfaces on Linux to Simulate Mouse Movements and Click
How to Flush Raw Af_Packet Socket to Get Correct Filtered Packets
Converting Date with Timezone in Unix Timestamp Shell/Bash
How to Disable Floating Point Unit (Fpu)
Linux - Limit Usb Device Bandwidth
Run Linux Command in Background and Keep Runing After Closing Ssh
Gnuplot-Like Program for Timeline Data
Toolchain to Crosscompile Applications for Bbb
What's The Meaning of This Sed Command? Sed 's%^.*/%%'
How to Handle Sigsegv Signal in Userspace Using Rust
Make for Compiling - All *.C Files in Folders & Subfolders in Project
Qemu Simple Backend Tracing Dosen'T Print Anything
Linux: Triggering Shell Command on File Save
Grunt Karma Testing on Vagrant When Host Changes Sources Grunt/Karma Doesn't Detect It
How to Get Window Id for Xdotool Automatically
Counting Number of Files in Multiple Subdirectories from Command Line