arranging text files side by side using python
Based on the inputs received, please follow the below steps.
# Change ulimit to increase the no of open files at a time
$ ulimit -n 4096
# Remove blank lines from all the files
$ sed -i '/^[[:space:]]*$/d' *.txt
# Join all files side by side to form a matrix view
$ paste $(ls -v *.txt) > matrix.txt
# Fill the blank values in the matrix view with 0's using awk inplace
$ awk -i inplace 'BEGIN { FS = OFS = "\t" } { for(i=1; i<=NF; i++) if($i ~ /^ *$/) $i = 0 }; 1' matrix.txt
arranging files in a text file
Example input files:
$ paste *.txt
one four six
two five seven
thee eight
nine
Using pandas
is one option.
import pandas as pd
from pathlib import Path
>>> pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
0 1 2 3
0 one two thee None
1 four five None None
2 six seven eight nine
You can then .tranpose()
/ .T
to turn each file into its own column.
>>> df = pd.DataFrame(f.read_text().splitlines() for f in Path().glob('*.txt'))
>>> df.T
0 1 2
0 one four six
1 two five seven
2 thee None eight
3 None None nine
You can then use .to_csv()
and set the sep
if you want to emulate the tabbed output from paste
>>> print(df.T.to_csv(index=None, header=None, sep='\t'), end='')
one four six
two five seven
thee eight
nine
You could also implement it using itertools.zip_longest
arranging files based on the names
It's unclear exactly what you want. This does what I think you want:
from glob import glob
# Returns a list of all relevant filenames
filenames = glob("*_10.asc_rsmp_1.0.txt")
# All the values will be stored in a dict where the key is the filename, and
# the value is a list of values
# It will be used later on to arrange the values side by side
values_by_filename = {}
# Read each filename
for filename in filenames:
with open(filename) as f:
with open(filename + "_processed.txt", "w") as f_new:
# Skip the first line (header)
next(f)
# Add all the values on every line to a single list
values = []
for line in f:
values.extend(line.split())
# Write each value on a new line in a new file
f_new.write("\n".join(values))
# Store the original filename and values to a dict for later
values_by_filename[filename] = values
# Order the filenames by the number before the first underscore
ordered_filenames = sorted(values_by_filename,
key=lambda filename: int(filename.split("_")[0]))
# Arrange the values side by side in a new file
# zip iterates over every list of values at once, yielding the next value
# from every list as a tuple each iteration
lines = []
for values in zip(*(values_by_filename[filename] for filename in ordered_filenames)):
# Separate each column by 3 spaces, as per your expected output
lines.append(" ".join(values))
# Write the concatenated values to file with a newline between each row, but
# not at the end of the file
with open("output.txt", "w") as f:
f.write("\n".join(lines))
output.txt
:
+2.0337053383e-02 +0.0000000000e+00
+4.7575540537e-02 +5.8895751219e-02
+2.7508078190e-02 +1.9720949872e-02
+3.9923797852e-02 +4.7712552071e-02
+2.1663353231e-02 +1.6255806150e-02
+4.6368790709e-02 +5.0983512543e-02
+2.8194768989e-02 +2.4151940813e-02
+3.8577115641e-02 +4.3959767187e-02
+2.1935380223e-02 +1.9066090517e-02
+4.6024962357e-02 +4.8980189213e-02
+2.9320681307e-02 +2.6237709462e-02
+3.7630711188e-02 +4.1379166269e-02
Be sure to read the documentation, in particular:
- https://docs.python.org/3/library/glob.html#glob.glob
- https://docs.python.org/3/library/functions.html#zip
- https://docs.python.org/3/library/functions.html#sorted
- https://docs.python.org/3/library/stdtypes.html#str.join
How to read and organize text files divided by keywords
You can read the file once and store the contents in a dictionary. Since you have conveniently labeled the "command" lines with a *, you can use all lines beginning with a * as the dictionary key and all following lines as the values for that key. You can do this with a for loop:
with open('geometry.txt') as f:
x = {}
key = None # store the most recent "command" here
for y in f.readlines()
if y[0] == '*':
key = y[1:] # your "command"
x[key] = []
else:
x[key].append(y.split()) # add subsequent lines to the most recent key
Or you can take advantage of python's list and dictionary comprehensions to do the same thing in one line:
with open('test.txt') as f:
x = {y.split('\n')[0]:[z.split() for z in y.strip().split('\n')[1:]] for y in f.read().split('*')[1:]}
which I'll admit is not very nice looking but it gets the job done by splitting the entire file into chunks between '*' characters and then using new lines and spaces as delimiters to break up the remaining chunks into dictionary keys and lists of lists (as dictionary values).
Details about splitting, stripping, and slicing strings can be found here
Related Topics
Why Is Dictionary Ordering Non-Deterministic
Index N Dimensional Array with (N-1) D Array
Get a List from Pandas Dataframe Column Headers
How to Log While Using Multiprocessing in Python
What Can You Use Generator Functions For
How to Detect Whether a Python Variable Is a Function
How to Get List of Methods in a Python Class
Is Python Interpreted, or Compiled, or Both
How to Get Href Links from HTML Using Python
How to Search Directories and Find Files That Match Regex
Why Are Scripting Languages (E.G. Perl, Python, and Ruby) Not Suitable as Shell Languages
Python Glob Multiple Filetypes
Python Setup.Py Develop VS Install