Extracting Data from Text Files

Extracting data from text files using python

If all your lines are similar, you can split the original line and extract the number as :

string = "Component Sizing Information, AirTerminal:SingleDuct:VAV:Reheat, SPACE2-1 VAV REHEAT, Design Size Maximum Flow per Zone Floor Area during Reheat [m3/s-m2], 1.31927E-003"
string = string.split(',') #split the string at commas
number = string[-1] #Extract the last number.
number = number.strip() #remove extra white spaces

How to extract data from Excel and output in txt file along with modified string using Python?

This is the type of question that's a little difficult to answer, because I'm afraid the answer is more or less just "go and learn how to actually program in Python". It seems like you're somewhat just blindly copying code you see in tutorials - if you were fluent in Python it would be obvious how to do what you're trying to do and why what you're doing right now won't work.

Still, since solving many individual problems is one of the way you learn something, let me see if I can give you some pointers here.

Your code currently looks like this (note that I shortened the output file path for the sake of readability in this answer):

for i in range(6, ws.max_row+1):  
name = ws.cell(row=i, column=1).value
outputFile = open('{}.txt'.format(name), 'w')
for j in range(1, ws.max_column + 1):
outputFile.write(ws.cell(row=i, column=j).value + '\n')
outputFile.close()

I can't run this code because I don't have any Excel files to hand, but I imagine it should produce four files called Name.txt, Salary.txt, Date.txt, and Phone.txt. Each file should contain the values from the corresponding row of the worksheet, separated by newlines.

Your questions are: (1) why is this outputting to four files instead of one, and (2) how can you write the SQL commands you want to that file instead of just the values from the worksheet.

For (1), the script is writing four files because that's exactly what you're telling it to do. You call open() four times, with four different filenames, so it creates four files. If you want to create just one file and write to that, try something like:

outputFile = open('output.txt', 'w')  
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
for j in range(1, ws.max_column + 1):
outputFile.write(ws.cell(row=i, column=j).value + '\n')
outputFile.close()

To write the output that you want, you should... write the output that you want. For example, to write the line "CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE(" to the file, you write

outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE(" + "\n")

To write "Hello", you run outputFile.write("Hello"), and so on. ws.cell(row=i, column=j).value gets you the contents of the (i, j)-th cell in the worksheet, which is why passing it to write() causes that value to be written to the file. Just call write() with what you want to be written to the file.

Extracting specific data from multiple text files and writing them into columns in csv

I think I see what you're trying to do, but I'm not sure.

I think your BEF file might look something like this:

a line
another line
Estimate file: pog_example.bef
Estimate ID: o1_p1
61078 (100.0%) estimated.
still more lines

If that's true, then once you find a line with 'Estimate file', you need to take control from the for-loop and start manually iterating the lines because you know which lines are coming up.

This is a very simple example script which opens my mock BEF file (above) and automatically iterates the lines till it finds 'Estimate file'. From there it processes each line specifically, using next(bef_file) to iterate to the next line, expecting them to have the correct text:

import csv

all_rows = []

bef_file = open('input.bef')
for line in bef_file:
if 'Estimate file' in line:
fname = line.split('pog_')[1].strip()

line = next(bef_file)
est_id = line.split('Estimate ID:')[1].strip()

line = next(bef_file)
value = line.strip()

row = [fname, est_id, value]
all_rows.append(row)
break # stop iterating lines in this file

csv_out = open('output.csv', 'w', newline='')
writer = csv.writer(csv_out)
writer.writerow(['File name', 'Est ID', 'Est Value'])
writer.writerows(all_rows)

When I run that I get this for output.csv:

File name,Est ID,Est Value
example.bef,o1_p1,61078 (100.0%) estimated.

If there are blank lines in your data between the lines you care about, manually step over them with next(bef_file) statements.

Extract data between two lines from text file

I did it storing everything in a dictionary, see code below.

f = open("test.txt")
lines = f.readlines()
dict_text = {"NAME":[], "DATEOFBIRTH":[], "BIO":[]}
for line_number, line in enumerate(lines):
if not ("NAME" in line or "DATE OF BIRTH" in line or "BIO" in line):
text = line.replace("\n","")
dict_text[location].append(text)
else:
location = "".join((line.split()))

Extracting delimited data from .txt file into vector

One approach us regular expression in a list comprehension

import re
with open('mydata.txt', 'r') as file: # mydata.txt is name of data file
var = [int(re.search(r'PMC(\d+)', line).group(1)) for line in file]

Explanation

r'PMC(\d+)'                    - regular expression for capturing digits after PMC
re.search(r'PMC(\d+)', line) - finds and captures digits in a line
re.search(...).group(1) - correspond to capture group 1 which are the digits
int(...) - converts digits from string to number
for line in file - iterates through the lines of the file


Related Topics



Leave a reply



Submit