Arranging Text Files Side by Side Using Python

arranging text files side by side using python

Based on the inputs received, please follow the below steps.

# Change ulimit to increase the no of open files at a time
$ ulimit -n 4096

# Remove blank lines from all the files
$ sed -i '/^[[:space:]]*$/d' *.txt

# Join all files side by side to form a matrix view
$ paste $(ls -v *.txt) > matrix.txt

# Fill the blank values in the matrix view with 0's using awk inplace
$ awk -i inplace 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = 0 }; 1' matrix.txt

arranging files in a text file

Example input files:

$ paste *.txt
one    four    six
two    five    seven
thee           eight   
               nine

Using pandas is one option.

import pandas  as pd
from   pathlib import Path

>>> pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
      0      1      2     3
0   one    two   thee  None
1  four   five   None  None
2   six  seven  eight  nine

You can then .tranpose() / .T to turn each file into its own column.

>>> df = pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
>>> df.T
      0     1      2
0   one  four    six
1   two  five  seven
2  thee  None  eight
3  None  None   nine

You can then use .to_csv() and set the sep if you want to emulate the tabbed output from paste

>>> print(df.T.to_csv(index=None, header=None, sep='\t'), end='')
one    four    six
two    five    seven
thee           eight   
               nine

You could also implement it using itertools.zip_longest

arranging files based on the names

It's unclear exactly what you want. This does what I think you want:

from glob import glob

# Returns a list of all relevant filenames
filenames = glob("*_10.asc_rsmp_1.0.txt")

# All the values will be stored in a dict where the key is the filename, and
# the value is a list of values
# It will be used later on to arrange the values side by side
values_by_filename = {}

# Read each filename
for filename in filenames:
    with open(filename) as f:
        with open(filename + "_processed.txt", "w") as f_new:
            
            # Skip the first line (header)
            next(f)
            
            # Add all the values on every line to a single list
            values = []
            for line in f:
                values.extend(line.split())
            
            # Write each value on a new line in a new file
            f_new.write("\n".join(values))
            
            # Store the original filename and values to a dict for later
            values_by_filename[filename] = values

# Order the filenames by the number before the first underscore
ordered_filenames = sorted(values_by_filename, 
                           key=lambda filename: int(filename.split("_")[0]))

# Arrange the values side by side in a new file
# zip iterates over every list of values at once, yielding the next value
# from every list as a tuple each iteration
lines = []
for values in zip(*(values_by_filename[filename] for filename in ordered_filenames)):
    
    # Separate each column by 3 spaces, as per your expected output
    lines.append("   ".join(values))

# Write the concatenated values to file with a newline between each row, but
# not at the end of the file
with open("output.txt", "w") as f:
    f.write("\n".join(lines))

output.txt:

+2.0337053383e-02   +0.0000000000e+00
+4.7575540537e-02   +5.8895751219e-02
+2.7508078190e-02   +1.9720949872e-02
+3.9923797852e-02   +4.7712552071e-02
+2.1663353231e-02   +1.6255806150e-02
+4.6368790709e-02   +5.0983512543e-02
+2.8194768989e-02   +2.4151940813e-02
+3.8577115641e-02   +4.3959767187e-02
+2.1935380223e-02   +1.9066090517e-02
+4.6024962357e-02   +4.8980189213e-02
+2.9320681307e-02   +2.6237709462e-02
+3.7630711188e-02   +4.1379166269e-02

Be sure to read the documentation, in particular:

https://docs.python.org/3/library/glob.html#glob.glob
https://docs.python.org/3/library/functions.html#zip
https://docs.python.org/3/library/functions.html#sorted
https://docs.python.org/3/library/stdtypes.html#str.join

How to read and organize text files divided by keywords

You can read the file once and store the contents in a dictionary. Since you have conveniently labeled the "command" lines with a *, you can use all lines beginning with a * as the dictionary key and all following lines as the values for that key. You can do this with a for loop:

with open('geometry.txt') as f:
    x = {}  
    key = None  # store the most recent "command" here
    for y in f.readlines()
        if y[0] == '*':
            key = y[1:] # your "command"
            x[key] = []
        else:
            x[key].append(y.split()) # add subsequent lines to the most recent key

Or you can take advantage of python's list and dictionary comprehensions to do the same thing in one line:

with open('test.txt') as f:
    x = {y.split('\n')[0]:[z.split() for z in y.strip().split('\n')[1:]] for y in f.read().split('*')[1:]}

which I'll admit is not very nice looking but it gets the job done by splitting the entire file into chunks between '*' characters and then using new lines and spaces as delimiters to break up the remaining chunks into dictionary keys and lists of lists (as dictionary values).

Details about splitting, stripping, and slicing strings can be found here

Arranging Text Files Side by Side Using Python