How to merge multiple text files by column?
Using awk
$ awk 'NR==FNR{array[$1]=$3FS$4FS$5FS$6FS$7FS$8; next} { print $0,array[$1]}' file2 file1
#rname startpos endpos numreads covbases coverage meandepth meanbaseq meanmapq endpos numreads covbases coverage meandepth meanbaseq
CFLAU10s2|kraken:taxid|33189 1 37904 6 375 0.989341 0.0157503 35.6 31 26432 8 318 1.20309 0.0361683 35.9
CFLAU10s4|kraken:taxid|33189 1 37174 2 222 0.597192 0.00812396 35.7 45 26105 2 91 0.348592 0.00697184 35
CFLAU10s5|kraken:taxid|33189 1 37080 2 70 0.188781 0.00377562 35.7 28 25631 4 122 0.475986 0.00951972 35.7
CFLAU10s6|kraken:taxid|33189 1 36175 4 145 0.400829 0.00801659 36.9 25.5 25014 2 64 0.255857 0.00511713 34.7
CFLAU10s8|kraken:taxid|33189 1 35457 4 155 0.437149 0.00874298 37 25.5 24803 522 10847 43.7326 2.22175 36.3
CFLAU10s9|kraken:taxid|33189 1 34752 6 111 0.319406 0.0191644 36.2 26 24139 2 59 0.244418 0.00488835 37
CFLAU10s13|kraken:taxid|33189 1 32255 2 156 0.483646 0.00917687 35.2 20 23391 24 1215 5.19431 0.109273 36.1
CFLAU10s14|kraken:taxid|33189 1 31915 4 63 0.197399 0.00695598 36.7 26.5 22963 12 318 1.38484 0.0531289 36.8
combine multiple text files with different number of columns separated by tab, into one merged text file in python
The simplest way is probably using numpy:
import numpy as np
filenames = ["file1.txt", "file2.txt", "file3.txt"]
fmt = '%.2f' # assuming format is known in advance
all_columns = []
for filename in filenames:
all_columns.append(np.genfromtxt(filename))
arr_out = np.column_stack(tuple(all_columns)) # Stack columns
# Fill NaN-elements with last numeric value
arr_1d = np.ravel(arr_out) # "flat reference" to arr_out
replaced_all_nan = False
nan_indices = np.where(np.isnan(arr_1d))
while len(nan_indices[0]):
new_indices = tuple([i-1 for i in nan_indices])
arr_1d[nan_indices] = arr_1d[new_indices]
nan_indices = np.where(np.isnan(arr_1d))
np.savetxt("merged.txt", arr_out, fmt=fmt)
One problem (if it is one for you) that might occur is that the very first, i.e. the upper-left element, is non-numeric. In that case, the last (lower-right) value or the last numeric value before that would be used.
How to create a combined dataframe from multiple text files and rename columns based on the file name in R
You can try -
library(tidyverse)
#List all the files
all_files <- dir("/Target_Data/TARGET_FHCRC/")
#include the files with the specific pattern
file_names <- grep(all_files,
pattern = "^T.*gene.quantification.txt$",value = TRUE)
#Read the file in a list with the name of the file as list name
Data_file <- sapply(file_names,read.delim, stringsAsFactors = FALSE,
check.names = FALSE, row.names = NULL, simplify = FALSE)
#Drop .txt from name of the list
names(Data_file) <- sub('\\.txt$', '', names(Data_file))
#For each dataframe in list of dataframe append name of the file
Merge_All_Samples <- Data_file %>%
imap(function(x, y) x %>%
#Drop the columns that you don't need
select(-median_length_normalized, -RPKM) %>%
rename_with(~paste0(., y), -gene)) %>%
reduce(inner_join, by = "gene")
Merge_All_Samples
Concatenate multiple txt files with a name of file as a first column
suggesting gawk
command:
To print the filename followed by ,
in first line of each file.
gawk 'BEGINFILE{printf("%s,",FILENAME)}1' *.txt
To print the filename followed by ,
in every line of each file.
awk '{print FILENAME "," $0}' *.txt
How to combine multiple .txt files with different # of rows in R and keep file names?
It will be easier to write a function and then rbind()
the data from each file. The resulting data frame will have a file
column with the filename from the listfile_aINS
vector.
read_file <- function(filename) {
dat <- read.table(filename,header = FALSE, sep = ",")
dat$file <- filename
return(dat)
}
all_dat <- do.call(rbind, lapply(listfile_aINS, read_file))
If they don't all have the same number of rows it might not make sense to have each column be a file, but if you really want that you could make it into a wide dataset with NA
filling out the empty rows:
library(dplyr)
library(tidyr)
all_dat %>%
group_by(file) %>%
mutate(n = 1:n()) %>%
pivot_wider(names_from = file, values_from = V1)
merging multiple .txt files into one .xlsx having different column names
IIUC:
data = {}
for i, filename in enumerate(['data1.txt', 'data2.txt', 'data3.txt'], 1):
data[f"col{i}"] = pd.read_csv(filename, squeeze=True, header=None)
pd.concat(data, axis=1).to_excel('output.xlsx')
Update
I realized that the col2 data contains the percentage details and in the output.xlsx
data = {}
for i, filename in enumerate(['data1.txt', 'data2.txt', 'data3.txt'], 1):
data[f"col{i}"] = pd.read_csv(filename, squeeze=True, header=None)
df = pd.concat(data, axis=1)
df['col2'] = df['col2'].str.strip('%').astype(float)
df.to_excel('output.xlsx', index=False)
Merging a specific column from multiple text files into one file
There are some things that you can do to make your program correct and "more pythonic."
with open('f_names.txt') as read_f:
# Collect the third columns in a list
data = []
for line in read_f:
# No need to chdir()
with open('{}/input.txt'.format(line.strip())) as f:
# This is one column
data.append([float(line.split()[2]) for line in f])
# Do not remove the old file, overwrite it
with open('combine.txt', 'w') as outfile:
# "Transpose" the list of columns into a list of rows
# Write each row
for row in zip(*data):
# Use new-style formatting
outfile.write(" ".join("{:.2f}".format(cell) for cell in row) + "\n")
Trying to join two text files based on the first column in both files and want to keep all the columns of the matches from the second file
I'm sure there are ways to do this is awk, but join is also relatively simple.
join -1 1 -2 1 List1.txt <(sort -k 1,1 List2.txt) > List3.txt
You are joining List1
based on the first column, and joining List2
also based on the first column. You then need to make sure the files are sorted in alphabetical order so join can work.
This produces the columns you want, separated by a whitespace.
List3.txt
action e KK SS @ n
adan a d @ n
adap a d a p
adapka a d a p k a
adat a d a t
yen j e n
Related Topics
How to Fix Urllib3 Runtimeerror: Requests Dependency 'Urllib3' Must Be Version >= 1.21.1, < 1.22
How to Customize or Remove Extra Linux Kernel Version Details Shown at Boot
Notify Gpio Interrupt to User Space from a Kernel Module
How to Find Out The User of Parent Shell Inside a Child Shell
Why Does High-Memory Not Exist for 64-Bit Cpu
Open Website from Within Eclipse's Internal Browser
How to Include Cutil.H in Linux
Numa-Aware Named Shared Memory for Linux
Store Passwords Required by a Linux Daemon
Cannot Get Qwindow::Fromwinid to Work Properly
Why Sigint Can Stop Bash in Terminal But Not via Kill -Int
Installing PHPsh on Linux, Python Error
Bash Script Counting Instances of Itself Wrongly
Why Does Gdb Prompt "Unexpected Size of Section '.Reg-Xstate/Xxxxx' in Core File."