Parsing a Pipe-Delimited File in Python

Parsing a pipe-delimited file in Python

If you're parsing a very simple file that won't contain any | characters in the actual field values, you can use split:

fileHandle = open('file', 'r')

for line in fileHandle:
    fields = line.split('|')

    print(fields[0]) # prints the first fields value
    print(fields[1]) # prints the second fields value

fileHandle.close()

A more robust way to parse tabular data would be to use the csv library as mentioned in Spencer Rathbun's answer.

Parsing a double pipe delimited file in python

The csv documentation says

A one-character string used to separate fields. It defaults to ','.

This is a hard constraint. One of the hacks that I can think of is to do some modification to the content string before being read by csv.reader. You can use replace('||', '|') method on each line of the input file before giving it as an argument to csv.reader.

input_file  = open('test.csv', "rb")
reader = csv.reader((line.replace('||', '|') for line in input_file), delimiter='|')

How can I read a pipe-delimited transaction file and generate the sales reports

This is possible via pandas.DataFrame.groupby:

import pandas as pd
from io import StringIO

mystr = """Pedro|groceries|apple|1.42
Nitin|tobacco|cigarettes|15.00
Susie|groceries|cereal|5.50
Susie|groceries|milk|4.75
Susie|tobacco|cigarettes|15.00
Susie|fuel|gasoline|44.90
Pedro|fuel|propane|9.60"""

df = pd.read_csv(StringIO(mystr), header=None, sep='|',
                 names=['Name', 'Category', 'Product', 'Sales'])

# Report 1
rep1 = df.groupby('Name')['Sales'].sum()

# Name
# Nitin    15.00
# Pedro    11.02
# Susie    70.15
# Name: Sales, dtype: float64

# Report 2
rep2 = df.groupby(['Name', 'Category'])['Sales'].sum()

# Name   Category 
# Nitin  tobacco      15.00
# Pedro  fuel          9.60
#        groceries     1.42
# Susie  fuel         44.90
#        groceries    10.25
#        tobacco      15.00
# Name: Sales, dtype: float64

Parsing a pipe delimited json data in python

Using str methods

Ex:

network = {
        'id': 112,
        'name': 'stalin-PC',
        'type': 'IP4Address',
        'properties': 'address=10.0.1.110|ipLong=277893412|state=DHCP Allocated|macAddress=41-1z-y4-23-dd-98'
}

for n in network['properties'].split("|"):
    key, value = n.split("=")
    print(key, "-->", value)

Output:

address --> 10.0.1.110
ipLong --> 277893412
state --> DHCP Allocated
macAddress --> 41-1z-y4-23-dd-98

python, loop through a pipe delimited text file and run an sql

Better use a context manager to open the file: use the "with". Also, I suggest you to use the subprocess.check_call function instead of os.system like this:

from subprocess import check_call

with open('/tmp/so_insert_report20150804.txt') as fd:
    for line in fd:
        c1, c2, c3, c4 = line.strip().split('|')
        check_call(['prosql', '-n', '/psd_apps/700p6/cus/so_insert.enq', c1, c3])

Check the subprocess module at:

https://docs.python.org/3/library/subprocess.html

By the way, as I suspect you're planning to modify a DB it is better to perform the calls in a more transactional way, that is, validating the tokens before performing the call like this:

from subprocess import check_call

def validatec1(c1):
    return str(int(c1))

def validatec3(c3):
    c3 = c3.strip()
    if not c3:
        raise Exception('Column 3 {} is empty'.format(c3))
    if not c3.startswith('ZZ'):
        raise Exception('Invalid value for c3 {}'.format(c3))
    return c3

batch = []

with open('/tmp/so_insert_report20150804.txt') as fd:
    for lnum, line in enumerate(fd, 1):
        try:
            c1, c2, c3, c4 = line.strip().split('|')
            batch.append((validatec1(c1), validatec3(c3)))
        except Exception as e:
            print('Error processing input file at line {}:\n{}'.format(lnum, line))
            raise e

for v1, v2 in batch:
    check_call(['prosql', '-n', '/psd_apps/700p6/cus/so_insert.enq', v1, v2])

How to read all the lines in a pipe delimited file in python?

It's because fields is being overwritten in the for loop.

You can probably change

for line in fileHandle:
    fields = line.split('|')

fields = [line.split('|') for line in fileHandle]

Or you can change the indent of the rest of your code

for line in fileHandle:
    fields = line.split('|')

    m = Message("ADT_A01")
    m.msh.msh_3 = 'GHH_ADT'
    m.msh.msh_7 = '20080115153000'
    m.msh.msh_9 = 'ADT^A01^ADT_A01'
    m.msh.msh_10 = "0123456789"
    m.msh.msh_11 = "P"
    m.msh.msh_12 = ""
    m.msh.msh_16 = "AL"
    m.evn.evn_2 = m.msh.msh_7
    m.evn.evn_4 = "AAA"
    m.evn.evn_5 = m.evn.evn_4
    m.pid.pid_5.pid_5_1 =  fields[1]
    m.nk1.nk1_1 = '1'
    m.nk1.nk1_2 = 'NUCLEAR^NELDA^W'
    m.nk1.nk1_3 = 'SPO'
    m.nk1.nk1_4 = '2222 HOME STREET^^ANN ARBOR^MI^^USA'

    print (m.value)

Parsing a Pipe-Delimited File in Python