Line contains NULL byte in CSV reader (Python)
I've solved a similar problem with an easier solution:
import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))
The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.
Python 3 - '_csv.Error: line contains NULL' error byte when reading csv file
csv.reader()
(and therefore also csv.DictReader()
) simply can't deal with files containing null bytes.
A possible solution would be to replace null bytes when reading the input file, e.g. by using a generator expression, since reader()
takes any object supporting the iterator protocol as argument:
with codecs.open(log_path, 'r') as csv_file:
log_reader = csv.DictReader((l.replace('\0', '') for l in csv_file))
for line in log_reader:
if line['Addition Information'] == str
# do something
`_csv.Error: line contains NUL` from a downloaded csv
The file is encoded as UTF-16, so this encoding must be specified when reading the file.
>>> # Check the first 100 characters...
>>> r = requests.get(url)
>>> r.content.decode('utf-16')[:100]
'sep=;\n"Domain Name";"Start Time";"End Time";"Reserve Price";"Domain is IDN";"Domain has hyphen";"Dom'
Depending on your platform, you need to open the file like this:
with open('downloaded_csv.csv', newline='', encoding=encoding) as in_file:
where the value of encoding is one of utf-16, utf-16-le, utf-16-be
Note that you may need to remove or skip the initial "sep=;"
line.
How to deal with _csv.Error: line contains NULL byte?
I think your question definitely needs to show a sample of the stream of bytes you expect from csv_file.stream
.
I like pushing myself to learn more about Python's approach to IO, encoding/decoding, and CSV, so I've worked this much out for myself, but probably don't expect others to.
import csv
from codecs import iterdecode
import io
# Flask's file.stream is probably BytesIO, see https://stackoverflow.com/a/18246385
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064
csv_bytes = b'''\xef\xbb\xbf C1, C2
r1c1, r1c2
r2c1, r2c2, r2c3\x00'''
# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)
# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)
decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')
reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")
for row in reader:
print(row)
and I get:
{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}
Python CSV error: line contains NULL byte, but no NULL byte found in the file
I found what the issue was. I was reading the files from an external hard drive formatted in NFTS, while the code was running on a macOS formatted in HFS.
After formatting the external drive to match the formatting on my laptop, the problem of null bytes disappeared.
Related Topics
Splitting a Pandas Dataframe Column by Delimiter
Index of Duplicates Items in a Python List
How to Create Animated Sprites Using Sprite Sheets in Pygame
Why Doesn't a Python Dict.Update() Return the Object
Why Does Sys.Exit() Not Exit When Called Inside a Thread in Python
How Include Static Files to Setuptools - Python Package
Which Is the Easiest Way to Simulate Keyboard and Mouse on Python
Sorting a List of Dot-Separated Numbers, Like Software Versions
How to Avoid Explicit 'Self' in Python
Remove Quotes from String in Python