Extracting data from text files using python
If all your lines are similar, you can split the original line and extract the number as :
string = "Component Sizing Information, AirTerminal:SingleDuct:VAV:Reheat, SPACE2-1 VAV REHEAT, Design Size Maximum Flow per Zone Floor Area during Reheat [m3/s-m2], 1.31927E-003"
string = string.split(',') #split the string at commas
number = string[-1] #Extract the last number.
number = number.strip() #remove extra white spaces
How to extract data from Excel and output in txt file along with modified string using Python?
This is the type of question that's a little difficult to answer, because I'm afraid the answer is more or less just "go and learn how to actually program in Python". It seems like you're somewhat just blindly copying code you see in tutorials - if you were fluent in Python it would be obvious how to do what you're trying to do and why what you're doing right now won't work.
Still, since solving many individual problems is one of the way you learn something, let me see if I can give you some pointers here.
Your code currently looks like this (note that I shortened the output file path for the sake of readability in this answer):
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
outputFile = open('{}.txt'.format(name), 'w')
for j in range(1, ws.max_column + 1):
outputFile.write(ws.cell(row=i, column=j).value + '\n')
outputFile.close()
I can't run this code because I don't have any Excel files to hand, but I imagine it should produce four files called Name.txt
, Salary.txt
, Date.txt
, and Phone.txt
. Each file should contain the values from the corresponding row of the worksheet, separated by newlines.
Your questions are: (1) why is this outputting to four files instead of one, and (2) how can you write the SQL commands you want to that file instead of just the values from the worksheet.
For (1), the script is writing four files because that's exactly what you're telling it to do. You call open()
four times, with four different filenames, so it creates four files. If you want to create just one file and write to that, try something like:
outputFile = open('output.txt', 'w')
for i in range(6, ws.max_row+1):
name = ws.cell(row=i, column=1).value
for j in range(1, ws.max_column + 1):
outputFile.write(ws.cell(row=i, column=j).value + '\n')
outputFile.close()
To write the output that you want, you should... write the output that you want. For example, to write the line "CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE(" to the file, you write
outputFile.write("CREATE TABLE TRANSIENT TABLE STG_EMPLOYEE(" + "\n")
To write "Hello", you run outputFile.write("Hello")
, and so on. ws.cell(row=i, column=j).value
gets you the contents of the (i, j)-th cell in the worksheet, which is why passing it to write()
causes that value to be written to the file. Just call write()
with what you want to be written to the file.
Extracting specific data from multiple text files and writing them into columns in csv
I think I see what you're trying to do, but I'm not sure.
I think your BEF file might look something like this:
a line
another line
Estimate file: pog_example.bef
Estimate ID: o1_p1
61078 (100.0%) estimated.
still more lines
If that's true, then once you find a line with 'Estimate file'
, you need to take control from the for-loop and start manually iterating the lines because you know which lines are coming up.
This is a very simple example script which opens my mock BEF file (above) and automatically iterates the lines till it finds 'Estimate file'
. From there it processes each line specifically, using next(bef_file)
to iterate to the next line, expecting them to have the correct text:
import csv
all_rows = []
bef_file = open('input.bef')
for line in bef_file:
if 'Estimate file' in line:
fname = line.split('pog_')[1].strip()
line = next(bef_file)
est_id = line.split('Estimate ID:')[1].strip()
line = next(bef_file)
value = line.strip()
row = [fname, est_id, value]
all_rows.append(row)
break # stop iterating lines in this file
csv_out = open('output.csv', 'w', newline='')
writer = csv.writer(csv_out)
writer.writerow(['File name', 'Est ID', 'Est Value'])
writer.writerows(all_rows)
When I run that I get this for output.csv:
File name,Est ID,Est Value
example.bef,o1_p1,61078 (100.0%) estimated.
If there are blank lines in your data between the lines you care about, manually step over them with next(bef_file)
statements.
Extract data between two lines from text file
I did it storing everything in a dictionary, see code below.
f = open("test.txt")
lines = f.readlines()
dict_text = {"NAME":[], "DATEOFBIRTH":[], "BIO":[]}
for line_number, line in enumerate(lines):
if not ("NAME" in line or "DATE OF BIRTH" in line or "BIO" in line):
text = line.replace("\n","")
dict_text[location].append(text)
else:
location = "".join((line.split()))
Extracting delimited data from .txt file into vector
One approach us regular expression in a list comprehension
import re
with open('mydata.txt', 'r') as file: # mydata.txt is name of data file
var = [int(re.search(r'PMC(\d+)', line).group(1)) for line in file]
Explanation
r'PMC(\d+)' - regular expression for capturing digits after PMC
re.search(r'PMC(\d+)', line) - finds and captures digits in a line
re.search(...).group(1) - correspond to capture group 1 which are the digits
int(...) - converts digits from string to number
for line in file - iterates through the lines of the file
Related Topics
How Is J() Function Implemented in Data.Table
Create Combinations of a Binary Vector
Control Alpha Blending/Opacity of N Overlapping Areas
Connect R and Vertica Using Rodbc
How to Make a Barplot with R from a Table
Predict Out of Sample on Fixed Effects Model
Can Not Connect Postgresql(Over Ssl) with Rpostgresql on Windows
How to Tell When My Dataset in R Is Going to Be Too Large
Find Matches of a Vector of Strings in Another Vector of Strings
R: Interpolation of Nas by Group
Store Arrangegrob to Object, Does Not Create Printable Object
Remove Duplicates Column Combinations from a Dataframe in R
Create a Histogram for Weighted Values
Combining Geom_Point and Geom_Line with Position_Jitterdodge for Two Grouping Factors
Tidyr::Pivot_Wider() Reorder Column Names Grouping by 'Name_From'
R Shiny Widgetfunc() Warning Messages with Eventreactive(Warning 1) and Renderdatatable (Warning 2)