CSV in Python adding an extra carriage return, on Windows
Python 3:
The official csv
documentation recommends open
ing the file with newline=''
on all platforms to disable universal newlines translation:
with open('output.csv', 'w', newline='', encoding='utf-8') as f:
writer = csv.writer(f)
...
The CSV writer terminates each line with the lineterminator
of the dialect, which is '\r\n'
for the default excel
dialect on all platforms because that's what RFC 4180 recommends.
Python 2:
On Windows, always open your files in binary mode ("rb"
or "wb"
), before passing them to csv.reader
or csv.writer
.
Although the file is a text file, CSV is regarded a binary format by the libraries involved, with \r\n
separating records. If that separator is written in text mode, the Python runtime replaces the \n
with \r\n
, hence the \r\r\n
observed in the file.
See this previous answer.
Python: Reading a Windows generated csv with carriage return in column
You could first parse out the extra carriage return characters using a regular expression and then use a multi-character seperator for Pandas.
import pandas as pd
import io
import re
import csv
with open('e_carriagereturn_20220430.dat', newline='') as f_input:
data = re.sub('\x0d[^\x0a]', ' ', f_input.read())
df = pd.read_csv(io.StringIO(data), sep='\|-\|', quoting=csv.QUOTE_NONE, engine='python', header=None)
print(df)
Alternatively without regular expressions you could pre-parse the data as follows. Use using the newline=''
mode to keep the newline characters. These can then be removed easily. Secondly use quoting=csv.QUOTE_NONE
to disable quote processing. Lastly remove any columns seen with just -
.
import pandas as pd
import io
import csv
rows = []
with open('e_carriagereturn_20220430.dat', newline='') as f_input:
data = f_input.read().replace('\x0d', '')
csv_input = csv.reader(io.StringIO(data), delimiter='|', quoting=csv.QUOTE_NONE)
for row in csv_input:
rows.append([value for value in row if value != '-'])
df = pd.DataFrame(rows)
print(df)
Both give output similar to:
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28
0 752296019 " 04/15/2022 00:00:00 04/28/2022 00:00:00 04/15/2022 00:00:00 13 0 0 A J J 0.0000 0.0000 0.0000 1.2300 123456.2700 J J -23.4500 N 04/19/2022 12:00:41 AEINSTEIN1 0.0000
1 752296020 " 03/31/2022 00:00:00 04/13/2022 00:00:00 03/31/2022 00:00:00 1 359542 12 318047.01 A J J 543.2100 0.0000 0.0000 32.1000 244680.4400 J J 543.2100 J 04/01/2022 12:44:42 PKDICK1 0.0000
2 752296032 ! 04/08/2022 00:00:00 04/22/2022 00:00:00 04/08/2022 00:00:00 2 222856 12 54321 A J N 26.8700 0.0000 0.0000 1.2800 38068.8800 J N J N 26.8700 J 04/06/2022 12:00:32 ABC003 0.0000
3 752296044 " 04/19/2022 00:00:00 05/02/2022 00:00:00 04/19/2022 00:00:00 2 222857 12 34877 D J N 6.7800 0.0000 0.0000 6.7800 122345.3500 J J 6.7800 J 04/19/2022 12:00:49 WGIBSON 0.0000
4 752296098 ! 04/17/2022 00:00:00 05/01/2022 00:00:00 04/17/2022 00:00:00 13 0 0 D N N 0.0000 0.0000 0.0000 8.7000 79689.4800 N N J N 0.0000 N 04/15/2022 12:24:58 ABC003 0.0000
5 431807560 " 04/12/2022 00:00:00 04/21/2022 00:00:00 04/12/2022 00:00:00 5 0 0 D J N 16.9600 0.0000 0.0000 0.8500 10919.6900 J J 16.7800 N 04/13/2022 14:49:44 FHERBERT 0.0000
6 431807563 ! 04/17/2022 00:00:00 05/01/2022 00:00:00 04/17/2022 00:00:00 11 0 0 D N N 0.0000 0.0000 0.0000 2.6700 31790.1600 N N J N 0.0000 N 04/15/2022 12:44:56 ABC003 0.0000
7 431807594 " 03/28/2022 00:00:00 04/11/2022 00:00:00 03/28/2022 00:00:00 1 580807 12 12345AB12345AB D J J 193.8200 0.0000 0.0000 19.3800 276921.4800 J J 193.8200 J 03/29/2022 12:00:38 WGIBSON 0.0000
8 431807597 " 04/19/2022 00:00:00 05/02/2022 00:00:00 04/19/2022 00:00:00 1 107348 12 12.45671/AB D J J 6.7800 0.0000 0.0000 6.7800 87133.8200 J J 6.7800 J 04/15/2022 12:22:35 UKLEGUIN 0.0000
9 679785779 " 03/18/2022 00:00:00 04/01/2022 00:00:00 03/18/2022 00:00:00 13 0 0 B N N 0.0000 0.0000 0.0000 9.3300 142940.7700 N N J N 0.0000 N 04/20/2022 08:04:02 AHUXLEY 0.0000
10 679785789 ! 04/15/2022 00:00:00 04/29/2022 00:00:00 04/15/2022 00:00:00 2 4876321 12 488250/CD D J N 876.5800 0.0000 0.0000 16.7800 200604.8900 J N J N 876.5400 J 04/13/2022 12:28:49 ABC003 0.0000
11 665661904 ! 04/15/2022 00:00:00 04/29/2022 00:00:00 04/15/2022 00:00:00 2 394132 12 46409 EF D J N 567.9800 0.0000 0.0000 9.1600 513561.4600 J N J N 567.8700 J 04/13/2022 12:24:37 ABC003 0.0000
12 665661909 " 03/25/2022 00:00:00 04/01/2022 00:00:00 03/25/2022 00:00:00 14 216308 12 97745894XY D J J 0.0000 0.0000 0.0000 11.4500 208666.1300 J J 0.0000 J 03/25/2022 12:25:03 FHERBERT 0.0000
13 665661934 ! 04/19/2022 00:00:00 05/02/2022 00:00:00 04/19/2022 00:00:00 2 627911 12 abc/21.4177 D J N 54.3200 0.0000 0.0000 23.4500 333689.0000 J N J N 54.3200 J 04/14/2022 23:15:20 ABC003 0.0000
14 665661945 ! 03/25/2022 00:00:00 04/07/2022 00:00:00 03/25/2022 00:00:00 1 3074312 12 923088/ABC D J J 199.2600 0.0000 0.0000 14.5600 850785.1500 J N J N 189.0120 J 03/25/2022 11:48:55 ABC003 0.0000
15 665661965 ! 04/22/2022 00:00:00 05/06/2022 00:00:00 04/22/2022 00:00:00 1 627921 12 27160 D J J 567.3400 0.0000 0.0000 45.6800 2252133.2900 J N J N 567.3400 J 04/20/2022 12:43:09 ABC003 0.0000
16 665661976 ! 04/22/2022 00:00:00 05/06/2022 00:00:00 04/22/2022 00:00:00 2 627942 12 1734793zy D J N 223.4800 0.0000 0.0000 23.4500 416715.9100 J J 234.5600 J 04/21/2022 12:04:19 ABC003 0.0000
17 665661978 ! 04/29/2022 00:00:00 05/13/2022 00:00:00 04/29/2022 00:00:00 2 627998 12 44524 fg D J N 226.3000 0.0000 0.0000 5.3700 162912.0800 J N J N 234.2000 J 04/21/2022 12:12:44 ABC003 0.0000
18 665661987 " 04/07/2022 00:00:00 04/19/2022 00:00:00 04/07/2022 00:00:00 14 0 0 D J J 78.6500 0.0000 0.0000 1.3400 56249.8400 N J 78.6500 N 04/08/2022 12:32:28 PKDICK1 0.0000
csv.writerows() puts newline after each row
This problem occurs only with Python on Windows.
In Python v3, you need to add newline=''
in the open call per:
Python 3.3 CSV.Writer writes extra blank rows
On Python v2, you need to open the file as binary with "b" in your open() call before passing to csv
Changing the line
with open('stocks2.csv','w') as f:
to:
with open('stocks2.csv','wb') as f:
will fix the problem
More info about the issue here:
CSV in Python adding an extra carriage return, on Windows
Python: write.csv adding extra carriage return
Default line terminator for csv.writer
is '\r\n'
. Explicitly specify lineterminator
argument if you want only '\n'
:
wr = csv.writer(csvFile, delimiter=';', lineterminator='\n')
need to read CSV with carriage returns as data using Python
Assuming based on your description that every row should be 4 fields wide. You could just replace all the new lines with commas then use range to generate the index number of every 4th field. You can then use that to get the parameter name and put the next 3 fields in a list. The below is just a quick example of how you could do this. But of course to be more clean and not worry about nested commas etc you could still use CSV reader to parse the data and then iterate it like this.
This solution does assume that you can read the entire file into memory. If you are talking about significantly large files then let me know as a different solution would be needed to read the file line by line
# Read the entire file into memory (hoping these are not large files :D)
with open("Data.csv") as my_csv_file:
data = my_csv_file.read()
# get the index of the first line and collect the data in the first line and split it
# so we can work out the nuber of fields per record as all records will have same num fields
index_of_end_of_first_line = data.find("\n")
num_fields = len(data[:index_of_end_of_first_line].split(','))
# Replace all new lines with commas and start an empty dict
data_fields = data.replace("\n", ",").split(',')
data_dict = {}
#loop over all the fields picking N number of fields at a time based on num_fields value
for index in range(0, len(data_fields), num_fields):
data_dict[data_fields[index]] = data_fields[index + 1:index + num_fields]
print(data_fields[index:index + num_fields])
print(data_dict)
OUTPUT
['Results Table 1', '1', '2', '3']
['Operator', 'name1', 'name2', 'name3']
['Test Date', '2/26/2020', '2/26/2020', '2/26/2020']
['Test Temperature', '70', '70', '70']
['Relative Humidity (%)', '25.00', '25.00', '25.00']
['Test Pressure', 'Ambient', 'Ambient', 'Ambient']
['Comments', '', '', '']
['Failure Location', 'Advancing', 'Advancing', 'Advancing']
['Tensile stress at Maximum Load (ksi)', '47.86', '46.04', '45.49']
['Force at Maximum Load (kip)', '9.20', '8.81', '8.70']
{'Results Table 1': ['1', '2', '3'], 'Operator': ['name1', 'name2', 'name3'], 'Test Date': ['2/26/2020', '2/26/2020', '2/26/2020'], 'Test Temperature': ['70', '70', '70'], 'Relative Humidity (%)': ['25.00', '25.00', '25.00'], 'Test Pressure': ['Ambient', 'Ambient', 'Ambient'], 'Comments': ['', '', ''], 'Failure Location': ['Advancing', 'Advancing', 'Advancing'], 'Tensile stress at Maximum Load (ksi)': ['47.86', '46.04', '45.49'], 'Force at Maximum Load (kip)': ['9.20', '8.81', '8.70']}
Related Topics
What Are the Differences Between the Urllib, Urllib2, Urllib3 and Requests Module
How to Convert a String to a Number If It Has Commas in It as Thousands Separators
Pg_Config Executable Not Found
What Should I Do with "Unexpected Indent" in Python
How to Make a Timezone Aware Datetime Object
Find Full Path of the Python Interpreter
Iterate Over All Pairs of Consecutive Items in a List
How to Know If an Object Has an Attribute in Python
How to Locate Element Using Selenium Chrome Webdriver in Python Selenium
What Is the Best Project Structure for a Python Application
Detect and Exclude Outliers in a Pandas Dataframe
Integer Division in Python 2 and Python 3
How to Return Dictionary Keys as a List in Python
Fast Punctuation Removal with Pandas
How to Get an Absolute File Path in Python
Filtering Pandas Dataframes on Dates
Python: Changes to My Copy Variable Affect the Original Variable
How to Catch and Print the Full Exception Traceback Without Halting/Exiting the Program