Convert a Fixed Width File from Text to CSV

convert a fixed width file from text to csv

GNU awk (gawk) supports this directly with FIELDWIDTHS, e.g.:

gawk '$1=$1' FIELDWIDTHS='4 2 5 1 1' OFS=, infile

Output:

aasd,fh,90135,1,2
ajsh,dj, 2445,d,f

Convert fixed width txt file to CSV / set-content or out-file -append?

Set-Content should work fine with some minor adjustments. Here is an example of how it should work (this is everything within your outer foreach loop):

$csvFile = $file.BaseName
    if (!(get-childitem ./ |where-object {$_.Name -like $csvFile})) #check whether file has been processed
    { 
        (gc $file | foreach {
                $_.Insert($columBreaks[0],",").Insert($columBreaks[1],",").Insert($columBreaks[2],",").`
                Insert($columBreaks[3],",").Insert($columBreaks[4],",").Insert($columBreaks[5],",").`
                Insert($columBreaks[6],",").Insert($columBreaks[7],",").Insert($columBreaks[8],",").`
                Insert($columBreaks[9],",").Insert($columBreaks[10],",")
            }) | set-content $csvFile #note parenthesis around everything that gets piped to set-content
    }

By the way, instead of splitting the filename on the '.', you can just get the name without the extension by using $file.BaseName:

$csvFile = $file.BaseName + ".csv"

Converting Fixed-Width File to .txt then the .txt to .csv

Was able to change the initial text writing portion to:

for key, value in dict_of_record_lists.items():
write_loc = 'c:/Users/Steve Barnard/Desktop/Git_Projects/E02_ingestion/'+ key +'_'+csv_ext 
with open(write_loc, "w", newline='') as parsed_file:
    for line in value:
        line_pre = "%s" % line
        parsed_file.write(line_pre[:-1]+'\n')

All the issues were fixed by avoiding python's built-in CSV writer.
The way my program added a comma following the line slices, left with one additional comma and the '\n'; this led the [:-1] slice in the write function to remove the \n and not the final ','. by adding the '\n' following the comma removal the entire problem was fixed and a functioning CSV that retained the spacing was created.

A text file can be created by swapping out the extension upon writing.

csv to fixed width file conversion using Python

You can find the longest value in the category, and add buffer spacing based on the the length:

data = """
3526 HIGH ST, SACRAMENTO, 95838, CA, 2, 1, 836, Residential, Wed May 21 00:00:00 EDT 2008, 59222, 38.631913, -121.434879
51 OMAHA CT, SACRAMENTO, 95823, CA, 3, 1, 1167, Residential, Wed May 21 00:00:00 EDT 2008, 68212, 38.478902, -121.431028
2796 BRANCH ST, SACRAMENTO, 95815, CA, 2, 1, 796, Residential, Wed May 21 00:00:00 EDT 2008, 68880, 38.618305, -121.443839
2805 JANETTE WAY, SACRAMENTO, 95815, CA, 2, 1, 852, Residential, Wed May 21 00:00:00 EDT 2008, 69307, 38.616835, -121.439146
6001 MCMAHON DR, SACRAMENTO, 95824, CA, 2, 1, 797, Residential, Wed May 21 00:00:00 EDT 2008, 81900, 38.51947, -121.435768
"""
new_data = [i.split(', ') for i in filter(None, data.split('\n'))]

def space(i, d):
  max_len = len(max(list(zip(*new_data))[i], key=len))
  return d+' '*(max_len-len(d))

final_result = '\n'.join(' '.join(space(*c) for c in enumerate(b)) for b in new_data)

Output:

3526 HIGH ST     SACRAMENTO 95838 CA 2 1 836  Residential Wed May 21 00:00:00 EDT 2008 59222 38.631913 -121.434879
51 OMAHA CT      SACRAMENTO 95823 CA 3 1 1167 Residential Wed May 21 00:00:00 EDT 2008 68212 38.478902 -121.431028
2796 BRANCH ST   SACRAMENTO 95815 CA 2 1 796  Residential Wed May 21 00:00:00 EDT 2008 68880 38.618305 -121.443839
2805 JANETTE WAY SACRAMENTO 95815 CA 2 1 852  Residential Wed May 21 00:00:00 EDT 2008 69307 38.616835 -121.439146
6001 MCMAHON DR  SACRAMENTO 95824 CA 2 1 797  Residential Wed May 21 00:00:00 EDT 2008 81900 38.51947  -121.435768

Converting a table of fixed width in text format into dataframe/excel/csv

You can use pandas.read_fwf (fixed-width format):

>>> df = pd.read_fwf('data.txt')
>>> df
    INDEX  YEAR  MN  DT   MAX   MIN  ...  T.2  G.2  DUR.2  T.3  G.3  DUR.3
0   14210  1972   9   1  32.0  22.0  ...  NaN  NaN    NaN  NaN  NaN    NaN
1   14210  1972   9   2  32.3  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
2   14210  1972   9   3  32.8  22.4  ...  NaN  NaN    NaN  NaN  NaN    NaN
3   14210  1972   9   4  32.0  22.0  ...  NaN  NaN    NaN  NaN  NaN    NaN
4   14210  1972   9   5  33.2  23.6  ...  0.0  7.0  280.0  NaN  NaN    NaN
5   14210  1972   9   6  31.6  23.2  ...  5.0  8.0   45.0  0.0  8.0    NaN
6   14210  1972   9   7  31.5  21.0  ...  5.0  4.0   45.0  NaN  NaN    NaN
7   14210  1972   9   8  29.7  21.6  ...  NaN  NaN    NaN  NaN  NaN    NaN
8   14210  1972   9   9  29.7  21.1  ...  NaN  NaN    NaN  NaN  NaN    NaN
9   14210  1972   9  10  27.6  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
10  14210  1972   9  11  30.3  21.3  ...  6.0  1.0   80.0  NaN  NaN    NaN
11  14210  1972   9  12  30.6  22.0  ...  5.0  5.0   30.0  NaN  NaN    NaN
12  14210  1972   9  13  30.2  21.4  ...  0.0  7.0  195.0  NaN  NaN    NaN
13  14210  1972   9  14  28.2  21.5  ...  NaN  NaN    NaN  NaN  NaN    NaN
14  14210  1972   9  15  30.3  21.9  ...  0.0  7.0  305.0  NaN  NaN    NaN
15  14210  1972   9  17  32.0  22.0  ...  6.0  7.0  135.0  NaN  NaN    NaN
16  14210  1972   9  18  32.0  20.5  ...  6.0  6.0   80.0  5.0  NaN    NaN

[17 rows x 38 columns]

Trouble converting a fixed-width file into a csv

Your attempt was good, but requires gawk (gnu awk) for the FIELDWIDTHS built-in variable. With gawk:

$ gawk -v FIELDWIDTHS="5 11 31 9 16" -v OFS=',' '{$1=$1;print}' file

ratno,      fdate, ratname                       , typecode, country
12346, 31/12/2010, HARTZ                         ,     4   , UNITED STATES
12444, 31/12/2010, CHRISTIE                      ,     5   , UNITED STATES
12527, 31/12/2010, HILL AIR                      ,     4   , UNITED STATES

Assuming you don't want the extra spaces, you can do instead:

$ gawk -v FIELDWIDTHS="5 11 31 9 16" -v OFS=',' '{for (i=1; i<=NF; ++i) gsub(/^ *| *$/, "", $i)}1' file
ratno,fdate,ratname,typecode,country
12346,31/12/2010,HARTZ,4,UNITED STATES
12444,31/12/2010,CHRISTIE,5,UNITED STATES
12527,31/12/2010,HILL AIR,4,UNITED STATES

If you don't have gnu awk, you can achieve the same results with:

$ awk -v fieldwidths="5 11 31 9 16" '
BEGIN { OFS=","; split(fieldwidths, widths) }
{
    rec = $0
    $0 = ""
    start = 1;
    for (i=1; i<=length(widths); ++i) {
        $i = substr(rec, start, widths[i])
        gsub(/^ *| *$/, "", $i)
        start += widths[i]
    }
}1' file

ratno,fdate,ratname,typecode,country
12346,31/12/2010,HARTZ,4,UNITED STATES
12444,31/12/2010,CHRISTIE,5,UNITED STATES
12527,31/12/2010,HILL AIR,4,UNITED STATES

Convert a Fixed Width File from Text to CSV