CSV in Python Adding an Extra Carriage Return, on Windows

CSV in Python adding an extra carriage return, on Windows

Python 3:

The official csv documentation recommends opening the file with newline='' on all platforms to disable universal newlines translation:

with open('output.csv', 'w', newline='', encoding='utf-8') as f:
    writer = csv.writer(f)
    ...

The CSV writer terminates each line with the lineterminator of the dialect, which is '\r\n' for the default excel dialect on all platforms because that's what RFC 4180 recommends.

Python 2:

On Windows, always open your files in binary mode ("rb" or "wb"), before passing them to csv.reader or csv.writer.

Although the file is a text file, CSV is regarded a binary format by the libraries involved, with \r\n separating records. If that separator is written in text mode, the Python runtime replaces the \n with \r\n, hence the \r\r\n observed in the file.

See this previous answer.

Python: Reading a Windows generated csv with carriage return in column

You could first parse out the extra carriage return characters using a regular expression and then use a multi-character seperator for Pandas.

import pandas as pd
import io
import re
import csv

with open('e_carriagereturn_20220430.dat', newline='') as f_input:
    data = re.sub('\x0d[^\x0a]', ' ', f_input.read())
    df = pd.read_csv(io.StringIO(data), sep='\|-\|', quoting=csv.QUOTE_NONE, engine='python', header=None)

print(df)

Alternatively without regular expressions you could pre-parse the data as follows. Use using the newline='' mode to keep the newline characters. These can then be removed easily. Secondly use quoting=csv.QUOTE_NONE to disable quote processing. Lastly remove any columns seen with just -.

import pandas as pd
import io
import csv

rows = []

with open('e_carriagereturn_20220430.dat', newline='') as f_input:
    data = f_input.read().replace('\x0d', '')
    csv_input = csv.reader(io.StringIO(data), delimiter='|', quoting=csv.QUOTE_NONE)
    
    for row in csv_input:
        rows.append([value for value in row if value != '-'])
        
df = pd.DataFrame(rows)        
print(df)

Both give output similar to:

            0  1                    2                    3                    4   5        6   7               8  9  10 11        12      13      14       15            16 17 18 19 20 21        22 23                   24             25 26 27      28
0   752296019  "  04/15/2022 00:00:00  04/28/2022 00:00:00  04/15/2022 00:00:00  13        0   0                  A  J  J    0.0000  0.0000  0.0000   1.2300   123456.2700  J     J        -23.4500  N  04/19/2022 12:00:41  AEINSTEIN1           0.0000
1   752296020  "  03/31/2022 00:00:00  04/13/2022 00:00:00  03/31/2022 00:00:00   1   359542  12       318047.01  A  J  J  543.2100  0.0000  0.0000  32.1000   244680.4400  J     J        543.2100  J  04/01/2022 12:44:42       PKDICK1         0.0000
2   752296032  !  04/08/2022 00:00:00  04/22/2022 00:00:00  04/08/2022 00:00:00   2   222856  12           54321  A  J  N   26.8700  0.0000  0.0000   1.2800    38068.8800  J  N  J  N      26.8700  J  04/06/2022 12:00:32   ABC003              0.0000
3   752296044  "  04/19/2022 00:00:00  05/02/2022 00:00:00  04/19/2022 00:00:00   2   222857  12           34877  D  J  N    6.7800  0.0000  0.0000   6.7800   122345.3500  J     J          6.7800  J  04/19/2022 12:00:49        WGIBSON        0.0000
4   752296098  !  04/17/2022 00:00:00  05/01/2022 00:00:00  04/17/2022 00:00:00  13        0   0                  D  N  N    0.0000  0.0000  0.0000   8.7000    79689.4800  N  N  J  N       0.0000  N  04/15/2022 12:24:58   ABC003              0.0000
5   431807560  "  04/12/2022 00:00:00  04/21/2022 00:00:00  04/12/2022 00:00:00   5        0   0                  D  J  N   16.9600  0.0000  0.0000   0.8500    10919.6900  J     J         16.7800  N  04/13/2022 14:49:44     FHERBERT          0.0000
6   431807563  !  04/17/2022 00:00:00  05/01/2022 00:00:00  04/17/2022 00:00:00  11        0   0                  D  N  N    0.0000  0.0000  0.0000   2.6700    31790.1600  N  N  J  N       0.0000  N  04/15/2022 12:44:56   ABC003              0.0000
7   431807594  "  03/28/2022 00:00:00  04/11/2022 00:00:00  03/28/2022 00:00:00   1   580807  12  12345AB12345AB  D  J  J  193.8200  0.0000  0.0000  19.3800   276921.4800  J     J        193.8200  J  03/29/2022 12:00:38        WGIBSON        0.0000
8   431807597  "  04/19/2022 00:00:00  05/02/2022 00:00:00  04/19/2022 00:00:00   1   107348  12     12.45671/AB  D  J  J    6.7800  0.0000  0.0000   6.7800    87133.8200  J     J          6.7800  J  04/15/2022 12:22:35      UKLEGUIN         0.0000
9   679785779  "  03/18/2022 00:00:00  04/01/2022 00:00:00  03/18/2022 00:00:00  13        0   0                  B  N  N    0.0000  0.0000  0.0000   9.3300   142940.7700  N  N  J  N       0.0000  N  04/20/2022 08:04:02     AHUXLEY           0.0000
10  679785789  !  04/15/2022 00:00:00  04/29/2022 00:00:00  04/15/2022 00:00:00   2  4876321  12       488250/CD  D  J  N  876.5800  0.0000  0.0000  16.7800   200604.8900  J  N  J  N     876.5400  J  04/13/2022 12:28:49   ABC003              0.0000
11  665661904  !  04/15/2022 00:00:00  04/29/2022 00:00:00  04/15/2022 00:00:00   2   394132  12        46409 EF  D  J  N  567.9800  0.0000  0.0000   9.1600   513561.4600  J  N  J  N     567.8700  J  04/13/2022 12:24:37   ABC003              0.0000
12  665661909  "  03/25/2022 00:00:00  04/01/2022 00:00:00  03/25/2022 00:00:00  14   216308  12      97745894XY  D  J  J    0.0000  0.0000  0.0000  11.4500   208666.1300  J     J          0.0000  J  03/25/2022 12:25:03     FHERBERT          0.0000
13  665661934  !  04/19/2022 00:00:00  05/02/2022 00:00:00  04/19/2022 00:00:00   2   627911  12     abc/21.4177  D  J  N   54.3200  0.0000  0.0000  23.4500   333689.0000  J  N  J  N      54.3200  J  04/14/2022 23:15:20   ABC003              0.0000
14  665661945  !  03/25/2022 00:00:00  04/07/2022 00:00:00  03/25/2022 00:00:00   1  3074312  12      923088/ABC  D  J  J  199.2600  0.0000  0.0000  14.5600   850785.1500  J  N  J  N     189.0120  J  03/25/2022 11:48:55   ABC003              0.0000
15  665661965  !  04/22/2022 00:00:00  05/06/2022 00:00:00  04/22/2022 00:00:00   1   627921  12           27160  D  J  J  567.3400  0.0000  0.0000  45.6800  2252133.2900  J  N  J  N     567.3400  J  04/20/2022 12:43:09   ABC003              0.0000
16  665661976  !  04/22/2022 00:00:00  05/06/2022 00:00:00  04/22/2022 00:00:00   2   627942  12       1734793zy  D  J  N  223.4800  0.0000  0.0000  23.4500   416715.9100  J     J        234.5600  J  04/21/2022 12:04:19   ABC003              0.0000
17  665661978  !  04/29/2022 00:00:00  05/13/2022 00:00:00  04/29/2022 00:00:00   2   627998  12        44524 fg  D  J  N  226.3000  0.0000  0.0000   5.3700   162912.0800  J  N  J  N     234.2000  J  04/21/2022 12:12:44   ABC003              0.0000
18  665661987  "  04/07/2022 00:00:00  04/19/2022 00:00:00  04/07/2022 00:00:00  14        0   0                  D  J  J   78.6500  0.0000  0.0000   1.3400    56249.8400  N     J         78.6500  N  04/08/2022 12:32:28       PKDICK1         0.0000

csv.writerows() puts newline after each row

This problem occurs only with Python on Windows.

In Python v3, you need to add newline='' in the open call per:

Python 3.3 CSV.Writer writes extra blank rows

On Python v2, you need to open the file as binary with "b" in your open() call before passing to csv

Changing the line

with open('stocks2.csv','w') as f:

to:

with open('stocks2.csv','wb') as f:

will fix the problem

More info about the issue here:

CSV in Python adding an extra carriage return, on Windows

Python: write.csv adding extra carriage return

Default line terminator for csv.writer is '\r\n'. Explicitly specify lineterminator argument if you want only '\n':

wr = csv.writer(csvFile, delimiter=';', lineterminator='\n')

need to read CSV with carriage returns as data using Python

Assuming based on your description that every row should be 4 fields wide. You could just replace all the new lines with commas then use range to generate the index number of every 4th field. You can then use that to get the parameter name and put the next 3 fields in a list. The below is just a quick example of how you could do this. But of course to be more clean and not worry about nested commas etc you could still use CSV reader to parse the data and then iterate it like this.

This solution does assume that you can read the entire file into memory. If you are talking about significantly large files then let me know as a different solution would be needed to read the file line by line

# Read the entire file into memory (hoping these are not large files :D)
with open("Data.csv") as my_csv_file:
    data = my_csv_file.read()

# get the index of the first line and collect the data in the first line and split it
# so we can work out the nuber of fields per record as all records will have same num fields
index_of_end_of_first_line = data.find("\n")
num_fields = len(data[:index_of_end_of_first_line].split(','))

# Replace all new lines with commas and start an empty dict
data_fields = data.replace("\n", ",").split(',')
data_dict = {}

#loop over all the fields picking N number of fields at a time based on num_fields value
for index in range(0, len(data_fields), num_fields):
    data_dict[data_fields[index]] = data_fields[index + 1:index + num_fields]
    print(data_fields[index:index + num_fields])
print(data_dict)

OUTPUT

['Results Table 1', '1', '2', '3']
['Operator', 'name1', 'name2', 'name3']
['Test Date', '2/26/2020', '2/26/2020', '2/26/2020']
['Test Temperature', '70', '70', '70']
['Relative Humidity (%)', '25.00', '25.00', '25.00']
['Test Pressure', 'Ambient', 'Ambient', 'Ambient']
['Comments', '', '', '']
['Failure Location', 'Advancing', 'Advancing', 'Advancing']
['Tensile stress at Maximum Load (ksi)', '47.86', '46.04', '45.49']
['Force at Maximum Load (kip)', '9.20', '8.81', '8.70']
{'Results Table 1': ['1', '2', '3'], 'Operator': ['name1', 'name2', 'name3'], 'Test Date': ['2/26/2020', '2/26/2020', '2/26/2020'], 'Test Temperature': ['70', '70', '70'], 'Relative Humidity (%)': ['25.00', '25.00', '25.00'], 'Test Pressure': ['Ambient', 'Ambient', 'Ambient'], 'Comments': ['', '', ''], 'Failure Location': ['Advancing', 'Advancing', 'Advancing'], 'Tensile stress at Maximum Load (ksi)': ['47.86', '46.04', '45.49'], 'Force at Maximum Load (kip)': ['9.20', '8.81', '8.70']}

CSV in Python Adding an Extra Carriage Return, on Windows