Parsing a Pipe-Delimited File in Python

Parsing a pipe-delimited file in Python

If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:

fileHandle = open('file', 'r')

for line in fileHandle:
fields = line.split('|')

print(fields[0]) # prints the first fields value
print(fields[1]) # prints the second fields value

fileHandle.close()

A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.

Parsing a double pipe delimited file in python

The csv documentation says

A one-character string used to separate fields. It defaults to ','.

This is a hard constraint. One of the hacks that I can think of is to do some modification to the content string before being read by csv.reader. You can use replace('||', '|') method on each line of the input file before giving it as an argument to csv.reader.

input_file  = open('test.csv', "rb")
reader = csv.reader((line.replace('||', '|') for line in input_file), delimiter='|')

How can I read a pipe-delimited transaction file and generate the sales reports

This is possible via pandas.DataFrame.groupby:

import pandas as pd
from io import StringIO

mystr = """Pedro|groceries|apple|1.42
Nitin|tobacco|cigarettes|15.00
Susie|groceries|cereal|5.50
Susie|groceries|milk|4.75
Susie|tobacco|cigarettes|15.00
Susie|fuel|gasoline|44.90
Pedro|fuel|propane|9.60"""

df = pd.read_csv(StringIO(mystr), header=None, sep='|',
names=['Name', 'Category', 'Product', 'Sales'])

# Report 1
rep1 = df.groupby('Name')['Sales'].sum()

# Name
# Nitin 15.00
# Pedro 11.02
# Susie 70.15
# Name: Sales, dtype: float64

# Report 2
rep2 = df.groupby(['Name', 'Category'])['Sales'].sum()

# Name Category
# Nitin tobacco 15.00
# Pedro fuel 9.60
# groceries 1.42
# Susie fuel 44.90
# groceries 10.25
# tobacco 15.00
# Name: Sales, dtype: float64

Parsing a pipe delimited json data in python

Using str methods

Ex:

network = {
'id': 112,
'name': 'stalin-PC',
'type': 'IP4Address',
'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98'
}

for n in network['properties'].split("|"):
key, value = n.split("=")
print(key, "-->", value)

Output:

address --> 10.0.1.110
ipLong --> 277893412
state --> DHCP Allocated
macAddress --> 41-1z-y4-23-dd-98

python, loop through a pipe delimited text file and run an sql

Better use a context manager to open the file: use the "with". Also, I suggest you to use the subprocess.check_call function instead of os.system like this:

from subprocess import check_call

with open('/tmp/so_insert_report20150804.txt') as fd:
for line in fd:
c1, c2, c3, c4 = line.strip().split('|')
check_call(['prosql', '-n', '/psd_apps/700p6/cus/so_insert.enq', c1, c3])

Check the subprocess module at:

https://docs.python.org/3/library/subprocess.html

By the way, as I suspect you're planning to modify a DB it is better to perform the calls in a more transactional way, that is, validating the tokens before performing the call like this:

from subprocess import check_call

def validatec1(c1):
return str(int(c1))

def validatec3(c3):
c3 = c3.strip()
if not c3:
raise Exception('Column 3 {} is empty'.format(c3))
if not c3.startswith('ZZ'):
raise Exception('Invalid value for c3 {}'.format(c3))
return c3

batch = []

with open('/tmp/so_insert_report20150804.txt') as fd:
for lnum, line in enumerate(fd, 1):
try:
c1, c2, c3, c4 = line.strip().split('|')
batch.append((validatec1(c1), validatec3(c3)))
except Exception as e:
print('Error processing input file at line {}:\n{}'.format(lnum, line))
raise e

for v1, v2 in batch:
check_call(['prosql', '-n', '/psd_apps/700p6/cus/so_insert.enq', v1, v2])

How to read all the lines in a pipe delimited file in python?

It's because fields is being overwritten in the for loop.

You can probably change

for line in fileHandle:
fields = line.split('|')

to

fields = [line.split('|') for line in fileHandle]

Or you can change the indent of the rest of your code

for line in fileHandle:
fields = line.split('|')

m = Message("ADT_A01")
m.msh.msh_3 = 'GHH_ADT'
m.msh.msh_7 = '20080115153000'
m.msh.msh_9 = 'ADT^A01^ADT_A01'
m.msh.msh_10 = "0123456789"
m.msh.msh_11 = "P"
m.msh.msh_12 = ""
m.msh.msh_16 = "AL"
m.evn.evn_2 = m.msh.msh_7
m.evn.evn_4 = "AAA"
m.evn.evn_5 = m.evn.evn_4
m.pid.pid_5.pid_5_1 = fields[1]
m.nk1.nk1_1 = '1'
m.nk1.nk1_2 = 'NUCLEAR^NELDA^W'
m.nk1.nk1_3 = 'SPO'
m.nk1.nk1_4 = '2222 HOME STREET^^ANN ARBOR^MI^^USA'

print (m.value)


Related Topics



Leave a reply



Submit