Convert a Fixed Width File from Text to CSV

convert a fixed width file from text to csv

GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:

gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile

Output:

aasd,fh,90135,1,2
ajsh,dj, 2445,d,f

Convert fixed width txt file to CSV / set-content or out-file -append?

Set-Content should work fine with some minor adjustments. Here is an example of how it should work (this is everything within your outer foreach loop):

$csvFile = $file.BaseName
if (!(get-childitem ./ |where-object {$_.Name -like $csvFile})) #check whether file has been processed
{
(gc $file | foreach {
$_.Insert($columBreaks[0],",").Insert($columBreaks[1],",").Insert($columBreaks[2],",").`
Insert($columBreaks[3],",").Insert($columBreaks[4],",").Insert($columBreaks[5],",").`
Insert($columBreaks[6],",").Insert($columBreaks[7],",").Insert($columBreaks[8],",").`
Insert($columBreaks[9],",").Insert($columBreaks[10],",")
}) | set-content $csvFile #note parenthesis around everything that gets piped to set-content
}

By the way, instead of splitting the filename on the '.', you can just get the name without the extension by using $file.BaseName:

$csvFile = $file.BaseName + ".csv"

Converting Fixed-Width File to .txt then the .txt to .csv

Was able to change the initial text writing portion to:

for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext
with open(write_loc, "w", newline='') as parsed_file:
for line in value:
line_pre = "%s" % line
parsed_file.write(line_pre[:-1]+'\n')

All the issues were fixed by avoiding python's built-in CSV writer.
The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created.

A text file can be created by swapping out the extension upon writing.

csv to fixed width file conversion using Python

You can find the longest value in the category, and add buffer spacing based on the the length:

data = """
3526 HIGH ST, SACRAMENTO, 95838, CA, 2, 1, 836, Residential, Wed May 21 00:00:00 EDT 2008, 59222, 38.631913, -121.434879
51 OMAHA CT, SACRAMENTO, 95823, CA, 3, 1, 1167, Residential, Wed May 21 00:00:00 EDT 2008, 68212, 38.478902, -121.431028
2796 BRANCH ST, SACRAMENTO, 95815, CA, 2, 1, 796, Residential, Wed May 21 00:00:00 EDT 2008, 68880, 38.618305, -121.443839
2805 JANETTE WAY, SACRAMENTO, 95815, CA, 2, 1, 852, Residential, Wed May 21 00:00:00 EDT 2008, 69307, 38.616835, -121.439146
6001 MCMAHON DR, SACRAMENTO, 95824, CA, 2, 1, 797, Residential, Wed May 21 00:00:00 EDT 2008, 81900, 38.51947, -121.435768
"""
new_data = [i.split(', ') for i in filter(None, data.split('\n'))]

def space(i, d):
max_len = len(max(list(zip(*new_data))[i], key=len))
return d+' '*(max_len-len(d))

final_result = '\n'.join(' '.join(space(*c) for c in enumerate(b)) for b in new_data)

Output:

3526 HIGH ST     SACRAMENTO 95838 CA 2 1 836  Residential Wed May 21 00:00:00 EDT 2008 59222 38.631913 -121.434879
51 OMAHA CT SACRAMENTO 95823 CA 3 1 1167 Residential Wed May 21 00:00:00 EDT 2008 68212 38.478902 -121.431028
2796 BRANCH ST SACRAMENTO 95815 CA 2 1 796 Residential Wed May 21 00:00:00 EDT 2008 68880 38.618305 -121.443839
2805 JANETTE WAY SACRAMENTO 95815 CA 2 1 852 Residential Wed May 21 00:00:00 EDT 2008 69307 38.616835 -121.439146
6001 MCMAHON DR SACRAMENTO 95824 CA 2 1 797 Residential Wed May 21 00:00:00 EDT 2008 81900 38.51947 -121.435768

Converting a table of fixed width in text format into dataframe/excel/csv

You can use pandas.read_fwf (fixed-width format):

>>> df = pd.read_fwf('data.txt')
>>> df
INDEX YEAR MN DT MAX MIN ... T.2 G.2 DUR.2 T.3 G.3 DUR.3
0 14210 1972 9 1 32.0 22.0 ... NaN NaN NaN NaN NaN NaN
1 14210 1972 9 2 32.3 21.5 ... NaN NaN NaN NaN NaN NaN
2 14210 1972 9 3 32.8 22.4 ... NaN NaN NaN NaN NaN NaN
3 14210 1972 9 4 32.0 22.0 ... NaN NaN NaN NaN NaN NaN
4 14210 1972 9 5 33.2 23.6 ... 0.0 7.0 280.0 NaN NaN NaN
5 14210 1972 9 6 31.6 23.2 ... 5.0 8.0 45.0 0.0 8.0 NaN
6 14210 1972 9 7 31.5 21.0 ... 5.0 4.0 45.0 NaN NaN NaN
7 14210 1972 9 8 29.7 21.6 ... NaN NaN NaN NaN NaN NaN
8 14210 1972 9 9 29.7 21.1 ... NaN NaN NaN NaN NaN NaN
9 14210 1972 9 10 27.6 21.5 ... NaN NaN NaN NaN NaN NaN
10 14210 1972 9 11 30.3 21.3 ... 6.0 1.0 80.0 NaN NaN NaN
11 14210 1972 9 12 30.6 22.0 ... 5.0 5.0 30.0 NaN NaN NaN
12 14210 1972 9 13 30.2 21.4 ... 0.0 7.0 195.0 NaN NaN NaN
13 14210 1972 9 14 28.2 21.5 ... NaN NaN NaN NaN NaN NaN
14 14210 1972 9 15 30.3 21.9 ... 0.0 7.0 305.0 NaN NaN NaN
15 14210 1972 9 17 32.0 22.0 ... 6.0 7.0 135.0 NaN NaN NaN
16 14210 1972 9 18 32.0 20.5 ... 6.0 6.0 80.0 5.0 NaN NaN

[17 rows x 38 columns]

Trouble converting a fixed-width file into a csv

Your attempt was good, but requires gawk (gnu awk) for the FIELDWIDTHS built-in variable. With gawk:

$ gawk -v FIELDWIDTHS="5 11 31 9 16" -v OFS=',' '{$1=$1;print}' file

ratno, fdate, ratname , typecode, country
12346, 31/12/2010, HARTZ , 4 , UNITED STATES
12444, 31/12/2010, CHRISTIE , 5 , UNITED STATES
12527, 31/12/2010, HILL AIR , 4 , UNITED STATES

Assuming you don't want the extra spaces, you can do instead:

$ gawk -v FIELDWIDTHS="5 11 31 9 16" -v OFS=',' '{for (i=1; i<=NF; ++i) gsub(/^ *| *$/, "", $i)}1' file
ratno,fdate,ratname,typecode,country
12346,31/12/2010,HARTZ,4,UNITED STATES
12444,31/12/2010,CHRISTIE,5,UNITED STATES
12527,31/12/2010,HILL AIR,4,UNITED STATES

If you don't have gnu awk, you can achieve the same results with:

$ awk -v fieldwidths="5 11 31 9 16" '
BEGIN { OFS=","; split(fieldwidths, widths) }
{
rec = $0
$0 = ""
start = 1;
for (i=1; i<=length(widths); ++i) {
$i = substr(rec, start, widths[i])
gsub(/^ *| *$/, "", $i)
start += widths[i]
}
}1' file

ratno,fdate,ratname,typecode,country
12346,31/12/2010,HARTZ,4,UNITED STATES
12444,31/12/2010,CHRISTIE,5,UNITED STATES
12527,31/12/2010,HILL AIR,4,UNITED STATES


Related Topics



Leave a reply



Submit