How to Extract One Column from Multiple Files, and Paste Those Columns into One File

How to extract one column from multiple files, and paste those columns into one file?

Here's one way using awk and a sorted glob of files:

awk '{ a[FNR] = (a[FNR] ? a[FNR] FS : "") $5 } END { for(i=1;i<=FNR;i++) print a[i] }' $(ls -1v *)

Results:

1 8 a
2 9 b
3 10 c
4 11 d
5 12 e
6 13 f
7 14 g

Explanation:

  • For each line of input of each input file:

    • Add the files line number to an array with a value of column 5.

    • (a[FNR] ? a[FNR] FS : "") is a ternary operation, which is set up to build up the arrays value as a record. It simply asks if the files line number is already in the array. If so, add the arrays value followed by the default file separator before adding the fifth column. Else, if the line number is not in the array, don't prepend anything, just let it equal the fifth column.

  • At the end of the script:

    • Use a C-style loop to iterate through the array, printing each of the arrays values.

Extracting same column from each file to one file

Here is a very quick and dirty way of doing it:

Assuming your files are in the same order:

$ awk '(FNR==NR){a[FNR]=$0;next}
{a[FNR]=a[FNR] FS $NF}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen

if you want the header a bit cleaner:

$ awk '(FNR==NR){a[FNR]=$0 (FNR==1?++c:"");next}
{a[FNR]=a[FNR] FS $NF (FNR==1?++c:"")}
END{for(i=1;i<=FNR;++i) print a[i]}' file1 file2 file3 ... filen

Assuming your files are not in the same order:

$ awk '{key=$1 FS $2}
(FNR==NR){a[key]=$0 (FNR==1?++c:"");next}
{a[key]=a[key] FS $NF (FNR==1?++c:"")}
END{for(i in a) print a[i]}' file1 file2 file3 ... filen

extract a column from many files and paste together in a single file

awk '{print $2}' dihed_*.dat > file.new

OR just use cut command

cut -d" " -f2 dihed_*.dat > file.new

EDIT

not know is it elegant solution or not but it fulfill your requirement.

tail -q -n +2  dihed_*.dat | awk '{print $2}' > file.new

And

tail -q -n +2  dihed_*.dat | cut -d" " -f2 > file.new

paste same column from multiple files into one

If I understand what you're trying to do correctly, then with awk you could use

awk -F '\t' 'FNR == 1 { ++file } { col[FNR, file] = $7 } END { for(i = 1; i <= FNR; ++i) { line = col[i, 1]; for(j = 2; j <= file; ++j) { line = line "\t" col[i, j] }; print line } }' file1 file2 file3 file4

The code is

FNR == 1 { ++file }                 # in the first line of a file, increase
# the file counter, so file is the number
# of the file we're processing
{
col[FNR, file] = $7 # remember the 7th column from all lines
} # by line and file number

END { # at the end:
for(i = 1; i <= FNR; ++i) { # walk through the lines,
line = col[i, 1] # paste together the columns in that line
for(j = 2; j <= file; ++j) { # from each file
line = line "\t" col[i, j]
}
print line # and print the result.
}
}

EDIT: Tweaked to assemble the lines on the fly rather than at the end, this could be shortened to

awk -F '\t' 'FNR == 1 && FNR != NR { sep = "\t" } { line[FNR] = line[FNR] sep $7 } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'

That is

FNR == 1 && FNR != NR {   # in the first line, but not in the first file
sep = "\t" # set the separator to a tab (in the first it's empty)
}
{ # assemble the line on the fly
line[FNR] = line[FNR] sep $7
}
END { # and in the end, print the lines.
for(i = 1; i <= FNR; ++i) {
print line[i]
}
}

Tethering yourself to gawk, this could be further shortened to

awk -F '\t' '{ line[FNR] = line[FNR] sep $7 } ENDFILE { sep = "\t" } END { for(i = 1; i <= FNR; ++i) { print line[i] } }'

...but ENDFILE is not known to other awk implementations such as mawk, so you may prefer to avoid it.

extract a column from multiple excel files and concatenate them into a single file with R

library(tidyverse)

filenames <- list.files(pattern = '\\.xlsx', full.names = TRUE)

map_df(filenames, ~readxl::read_excel(.x) %>%
select(4) %>%
mutate(col = names(.)[1]) %>%
rename(value = 1)) -> result

writexl::write_xlsx(result, 'new_data.xlsx')

How to copy unique columns from multiple txt files to form one new txt file in unix

output without order or any relationship:

1, $4"}' there is a extra ", should remove it in your command.

2, the last FileC.txt should change FileB.txt.

3, the paste following but its not working's code maybe has some error format? could your upload a screenshot for that ?

4, just only merge the column without something relationship judgement ?

If yes, try this( only some format fixed by your code to one-line ):

paste <(awk '{print $1"\t"$2}' FileC.txt) <(awk '{print $2}' FileA.txt)  <(awk '{print $2"\t"$3"\t"$4}' FileB.txt) > output.txt

get what you want before me edit your question:

$ cat output.txt 
Subject_ID Data_ID sample_ID age sex smok
1869793 2253798 20481 11 2 1
1869585 2253793 20483 7 1 3
1870238 2253791 20488 9 2 1

output with order and the relationship:

join -1 2 -2 1 <(join -1 2 -2 1 <(sort -k2 FileC.txt) <(sort -k1 FileA.txt) | sort -k2) <(sort -k1 FileB.txt) | tac | awk 'NR==1 {line=$0; next} 1; END{print line}' | tac

get the output:

Subject_ID Data_ID sample_ID age sex smok
1869585 2253793 20483 7 1 3
1869793 2253798 20488 11 2 1
1870238 2253791 20481 9 2 1

1, use join to merge two file's line by the same column which had been sort.

2, use tac file | awk 'NR==1 {line=$0; next} 1; END{print line}' | tac move the header to the top.



Related Topics



Leave a reply



Submit