How to extract one column from multiple files, and paste those columns into one file?
Here's one way using awk
and a sorted glob of files:
awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)
Results:
1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g
Explanation:
For each line of input of each input file:
Add the files line number to an array with a value of column 5.
(a[FNR] ? a[FNR] FS : "")
is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.
At the end of the script:
- Use a C-style loop to iterate through the array, printing each of the arrays values.
Extracting same column from each file to one file
Here is a very quick and dirty way of doing it:
Assuming your files are in the same order:
$ awk '(FNR==NR){a[FNR]=$0;next}
{a[FNR]=a[FNR] FS $NF}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
if you want the header a bit cleaner:
$ awk '(FNR==NR){a[FNR]=$0 (FNR==1?++c:"");next}
{a[FNR]=a[FNR] FS $NF (FNR==1?++c:"")}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen
Assuming your files are not in the same order:
$ awk '{key=$1 FS $2}
(FNR==NR){a[key]=$0 (FNR==1?++c:"");next}
{a[key]=a[key] FS $NF (FNR==1?++c:"")}
END{for(i in a) print a[i]}' file1 file2 file3 ... filen
extract a column from many files and paste together in a single file
awk '{print $2}' dihed_*.dat > file.new
OR just use cut
command
cut -d" " -f2 dihed_*.dat > file.new
EDIT
not know is it elegant solution or not but it fulfill your requirement.
tail -q -n +2 dihed_*.dat | awk '{print $2}' > file.new
And
tail -q -n +2 dihed_*.dat | cut -d" " -f2 > file.new
paste same column from multiple files into one
If I understand what you're trying to do correctly, then with awk you could use
awk -F '\t' 'FNR == 1 { ++file } { col[FNR, file] = $7 } END { for(i = 1; i <= FNR; ++i) { line = col[i, 1]; for(j = 2; j <= file; ++j) { line = line "\t" col[i, j] }; print line } }' file1 file2 file3 file4
The code is
FNR == 1 { ++file } # in the first line of a file, increase
# the file counter, so file is the number
# of the file we're processing
{
col[FNR, file] = $7 # remember the 7th column from all lines
} # by line and file number
END { # at the end:
for(i = 1; i <= FNR; ++i) { # walk through the lines,
line = col[i, 1] # paste together the columns in that line
for(j = 2; j <= file; ++j) { # from each file
line = line "\t" col[i, j]
}
print line # and print the result.
}
}
EDIT: Tweaked to assemble the lines on the fly rather than at the end, this could be shortened to
awk -F '\t' 'FNR == 1 && FNR != NR { sep = "\t" } { line[FNR] = line[FNR] sep $7 } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'
That is
FNR == 1 && FNR != NR { # in the first line, but not in the first file
sep = "\t" # set the separator to a tab (in the first it's empty)
}
{ # assemble the line on the fly
line[FNR] = line[FNR] sep $7
}
END { # and in the end, print the lines.
for(i = 1; i <= FNR; ++i) {
print line[i]
}
}
Tethering yourself to gawk, this could be further shortened to
awk -F '\t' '{ line[FNR] = line[FNR] sep $7 } ENDFILE { sep = "\t" } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'
...but ENDFILE
is not known to other awk implementations such as mawk, so you may prefer to avoid it.
extract a column from multiple excel files and concatenate them into a single file with R
library(tidyverse)
filenames <- list.files(pattern = '\\.xlsx', full.names = TRUE)
map_df(filenames, ~readxl::read_excel(.x) %>%
select(4) %>%
mutate(col = names(.)[1]) %>%
rename(value = 1)) -> result
writexl::write_xlsx(result, 'new_data.xlsx')
How to copy unique columns from multiple txt files to form one new txt file in unix
output without order or any relationship:
1, $4"}'
there is a extra "
, should remove it in your command.
2, the last FileC.txt should change FileB.txt.
3, the paste following but its not working
's code maybe has some error format? could your upload a screenshot for that ?
4, just only merge the column without something relationship judgement ?
If yes, try this( only some format fixed by your code to one-line ):
paste <(awk '{print $1"\t"$2}' FileC.txt) <(awk '{print $2}' FileA.txt) <(awk '{print $2"\t"$3"\t"$4}' FileB.txt) > output.txt
get what you want before me edit your question:
$ cat output.txt
Subject_ID Data_ID sample_ID age sex smok
1869793 2253798 20481 11 2 1
1869585 2253793 20483 7 1 3
1870238 2253791 20488 9 2 1
output with order and the relationship:
join -1 2 -2 1 <(join -1 2 -2 1 <(sort -k2 FileC.txt) <(sort -k1 FileA.txt) | sort -k2) <(sort -k1 FileB.txt) | tac | awk 'NR==1 {line=$0; next} 1; END{print line}' | tac
get the output:
Subject_ID Data_ID sample_ID age sex smok
1869585 2253793 20483 7 1 3
1869793 2253798 20488 11 2 1
1870238 2253791 20481 9 2 1
1, use join
to merge two file's line by the same column which had been sort
.
2, use tac file | awk 'NR==1 {line=$0; next} 1; END{print line}' | tac
move the header to the top.
Related Topics
Shell Script for Process Monitoring
Read File Without Disk Caching in Linux
Sed Error:Bad Option in Substitution Expression
Too Many Open Files Error While Running Awk Command
What Do the Suffixes "+" and "-" After the Job Id of Background Jobs Mean
Get a Spectrum of Frequencies from Wav/Riff Using Linux Command Line
How to Produce Stand Alone Haskell Executable
What Length Can a Network Interface Name Have
Limit Useable Host Resources in Docker Compose Without Swarm
How to Avoid "No Such File or Directory" Error for 'Make Clean' Makefile Target
How to Make the Watch Command Interpret Vt100 Sequences
Possibly Undefined MACro: Ac_Prog_Libtool
What Does the "Mov Rax, Qword Ptr Fs:0X28" Assembly Instruction Do
What Does the Line '!/Bin/Sh -E' Do
Changing Color of Eclipse Links in Quick Fix or Eclipse Links in Preferences on Linux
Visual Studio Code Debugger Error:"Could Not Find the Task 'Gcc Build Active File'