How to Join Multiple Txt Files into Based on Column

How to merge multiple text files by column?

Using awk

$ awk 'NR==FNR{array[$1]=$3FS$4FS$5FS$6FS$7FS$8; next} { print $0,array[$1]}' file2 file1
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq endpos numreads covbases coverage meandepth meanbaseq
CFLAU10s2|kraken:taxid|33189 1 37904 6 375 0.989341 0.0157503 35.6 31 26432 8 318 1.20309 0.0361683 35.9
CFLAU10s4|kraken:taxid|33189 1 37174 2 222 0.597192 0.00812396 35.7 45 26105 2 91 0.348592 0.00697184 35
CFLAU10s5|kraken:taxid|33189 1 37080 2 70 0.188781 0.00377562 35.7 28 25631 4 122 0.475986 0.00951972 35.7
CFLAU10s6|kraken:taxid|33189 1 36175 4 145 0.400829 0.00801659 36.9 25.5 25014 2 64 0.255857 0.00511713 34.7
CFLAU10s8|kraken:taxid|33189 1 35457 4 155 0.437149 0.00874298 37 25.5 24803 522 10847 43.7326 2.22175 36.3
CFLAU10s9|kraken:taxid|33189 1 34752 6 111 0.319406 0.0191644 36.2 26 24139 2 59 0.244418 0.00488835 37
CFLAU10s13|kraken:taxid|33189 1 32255 2 156 0.483646 0.00917687 35.2 20 23391 24 1215 5.19431 0.109273 36.1
CFLAU10s14|kraken:taxid|33189 1 31915 4 63 0.197399 0.00695598 36.7 26.5 22963 12 318 1.38484 0.0531289 36.8

combine multiple text files with different number of columns separated by tab, into one merged text file in python

The simplest way is probably using numpy:

import numpy as np

filenames = ["file1.txt", "file2.txt", "file3.txt"]
fmt = '%.2f' # assuming format is known in advance

all_columns = []
for filename in filenames:
all_columns.append(np.genfromtxt(filename))

arr_out = np.column_stack(tuple(all_columns)) # Stack columns

# Fill NaN-elements with last numeric value
arr_1d = np.ravel(arr_out) # "flat reference" to arr_out
replaced_all_nan = False
nan_indices = np.where(np.isnan(arr_1d))
while len(nan_indices[0]):
new_indices = tuple([i-1 for i in nan_indices])
arr_1d[nan_indices] = arr_1d[new_indices]
nan_indices = np.where(np.isnan(arr_1d))

np.savetxt("merged.txt", arr_out, fmt=fmt)

One problem (if it is one for you) that might occur is that the very first, i.e. the upper-left element, is non-numeric. In that case, the last (lower-right) value or the last numeric value before that would be used.

How to create a combined dataframe from multiple text files and rename columns based on the file name in R

You can try -

library(tidyverse)
#List all the files
all_files <- dir("/Target_Data/TARGET_FHCRC/")

#include the files with the specific pattern
file_names <- grep(all_files,
pattern = "^T.*gene.quantification.txt$",value = TRUE)

#Read the file in a list with the name of the file as list name
Data_file <- sapply(file_names,read.delim, stringsAsFactors = FALSE,
check.names = FALSE, row.names = NULL, simplify = FALSE)
#Drop .txt from name of the list
names(Data_file) <- sub('\\.txt$', '', names(Data_file))

#For each dataframe in list of dataframe append name of the file
Merge_All_Samples <- Data_file %>%
imap(function(x, y) x %>%
#Drop the columns that you don't need
select(-median_length_normalized, -RPKM) %>%
rename_with(~paste0(., y), -gene)) %>%
reduce(inner_join, by = "gene")

Merge_All_Samples

Concatenate multiple txt files with a name of file as a first column

suggesting gawk command:

To print the filename followed by , in first line of each file.

gawk 'BEGINFILE{printf("%s,",FILENAME)}1' *.txt

To print the filename followed by , in every line of each file.

awk '{print FILENAME "," $0}' *.txt

How to combine multiple .txt files with different # of rows in R and keep file names?

It will be easier to write a function and then rbind() the data from each file. The resulting data frame will have a file column with the filename from the listfile_aINS vector.

read_file <- function(filename) {
dat <- read.table(filename,header = FALSE, sep = ",")
dat$file <- filename
return(dat)
}

all_dat <- do.call(rbind, lapply(listfile_aINS, read_file))

If they don't all have the same number of rows it might not make sense to have each column be a file, but if you really want that you could make it into a wide dataset with NA filling out the empty rows:

library(dplyr)
library(tidyr)
all_dat %>%
group_by(file) %>%
mutate(n = 1:n()) %>%
pivot_wider(names_from = file, values_from = V1)

merging multiple .txt files into one .xlsx having different column names

IIUC:

data = {}
for i, filename in enumerate(['data1.txt', 'data2.txt', 'data3.txt'], 1):
data[f"col{i}"] = pd.read_csv(filename, squeeze=True, header=None)
pd.concat(data, axis=1).to_excel('output.xlsx')

Update

I realized that the col2 data contains the percentage details and in the output.xlsx

data = {}
for i, filename in enumerate(['data1.txt', 'data2.txt', 'data3.txt'], 1):
data[f"col{i}"] = pd.read_csv(filename, squeeze=True, header=None)
df = pd.concat(data, axis=1)
df['col2'] = df['col2'].str.strip('%').astype(float)
df.to_excel('output.xlsx', index=False)

Merging a specific column from multiple text files into one file

There are some things that you can do to make your program correct and "more pythonic."

with open('f_names.txt') as read_f:
# Collect the third columns in a list
data = []
for line in read_f:
# No need to chdir()
with open('{}/input.txt'.format(line.strip())) as f:
# This is one column
data.append([float(line.split()[2]) for line in f])

# Do not remove the old file, overwrite it
with open('combine.txt', 'w') as outfile:
# "Transpose" the list of columns into a list of rows
# Write each row
for row in zip(*data):
# Use new-style formatting
outfile.write(" ".join("{:.2f}".format(cell) for cell in row) + "\n")

Trying to join two text files based on the first column in both files and want to keep all the columns of the matches from the second file

I'm sure there are ways to do this is awk, but join is also relatively simple.

join -1 1 -2 1 List1.txt <(sort -k 1,1 List2.txt) > List3.txt

You are joining List1 based on the first column, and joining List2 also based on the first column. You then need to make sure the files are sorted in alphabetical order so join can work.

This produces the columns you want, separated by a whitespace.

List3.txt
action e KK SS @ n
adan a d @ n
adap a d a p
adapka a d a p k a
adat a d a t
yen j e n


Related Topics



Leave a reply



Submit