Arranging Text Files Side by Side Using Python

arranging text files side by side using python

Based on the inputs received, please follow the below steps.

# Change ulimit to increase the no of open files at a time
$ ulimit -n 4096

# Remove blank lines from all the files
$ sed -i '/^[[:space:]]*$/d' *.txt

# Join all files side by side to form a matrix view
$ paste $(ls -v *.txt) > matrix.txt

# Fill the blank values in the matrix view with 0's using awk inplace
$ awk -i inplace 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = 0 }; 1' matrix.txt

arranging files in a text file

Example input files:

$ paste *.txt
one four six
two five seven
thee eight
nine

Using pandas is one option.

import pandas  as pd
from pathlib import Path

>>> pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
0 1 2 3
0 one two thee None
1 four five None None
2 six seven eight nine

You can then .tranpose() / .T to turn each file into its own column.

>>> df = pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
>>> df.T
0 1 2
0 one four six
1 two five seven
2 thee None eight
3 None None nine

You can then use .to_csv() and set the sep if you want to emulate the tabbed output from paste

>>> print(df.T.to_csv(index=None, header=None, sep='\t'), end='')
one four six
two five seven
thee eight
nine

You could also implement it using itertools.zip_longest

arranging files based on the names

It's unclear exactly what you want. This does what I think you want:

from glob import glob

# Returns a list of all relevant filenames
filenames = glob("*_10.asc_rsmp_1.0.txt")

# All the values will be stored in a dict where the key is the filename, and
# the value is a list of values
# It will be used later on to arrange the values side by side
values_by_filename = {}

# Read each filename
for filename in filenames:
with open(filename) as f:
with open(filename + "_processed.txt", "w") as f_new:

# Skip the first line (header)
next(f)

# Add all the values on every line to a single list
values = []
for line in f:
values.extend(line.split())

# Write each value on a new line in a new file
f_new.write("\n".join(values))

# Store the original filename and values to a dict for later
values_by_filename[filename] = values

# Order the filenames by the number before the first underscore
ordered_filenames = sorted(values_by_filename,
key=lambda filename: int(filename.split("_")[0]))

# Arrange the values side by side in a new file
# zip iterates over every list of values at once, yielding the next value
# from every list as a tuple each iteration
lines = []
for values in zip(*(values_by_filename[filename] for filename in ordered_filenames)):

# Separate each column by 3 spaces, as per your expected output
lines.append(" ".join(values))

# Write the concatenated values to file with a newline between each row, but
# not at the end of the file
with open("output.txt", "w") as f:
f.write("\n".join(lines))

output.txt:

+2.0337053383e-02   +0.0000000000e+00
+4.7575540537e-02 +5.8895751219e-02
+2.7508078190e-02 +1.9720949872e-02
+3.9923797852e-02 +4.7712552071e-02
+2.1663353231e-02 +1.6255806150e-02
+4.6368790709e-02 +5.0983512543e-02
+2.8194768989e-02 +2.4151940813e-02
+3.8577115641e-02 +4.3959767187e-02
+2.1935380223e-02 +1.9066090517e-02
+4.6024962357e-02 +4.8980189213e-02
+2.9320681307e-02 +2.6237709462e-02
+3.7630711188e-02 +4.1379166269e-02

Be sure to read the documentation, in particular:

  • https://docs.python.org/3/library/glob.html#glob.glob
  • https://docs.python.org/3/library/functions.html#zip
  • https://docs.python.org/3/library/functions.html#sorted
  • https://docs.python.org/3/library/stdtypes.html#str.join

How to read and organize text files divided by keywords

You can read the file once and store the contents in a dictionary. Since you have conveniently labeled the "command" lines with a *, you can use all lines beginning with a * as the dictionary key and all following lines as the values for that key. You can do this with a for loop:

with open('geometry.txt') as f:
x = {}
key = None # store the most recent "command" here
for y in f.readlines()
if y[0] == '*':
key = y[1:] # your "command"
x[key] = []
else:
x[key].append(y.split()) # add subsequent lines to the most recent key

Or you can take advantage of python's list and dictionary comprehensions to do the same thing in one line:

with open('test.txt') as f:
x = {y.split('\n')[0]:[z.split() for z in y.strip().split('\n')[1:]] for y in f.read().split('*')[1:]}

which I'll admit is not very nice looking but it gets the job done by splitting the entire file into chunks between '*' characters and then using new lines and spaces as delimiters to break up the remaining chunks into dictionary keys and lists of lists (as dictionary values).

Details about splitting, stripping, and slicing strings can be found here



Related Topics



Leave a reply



Submit