"Line Contains Null Byte" in CSV Reader (Python)

Line contains NULL byte in CSV reader (Python)

I've solved a similar problem with an easier solution:

import codecs
csvReader = csv.reader(codecs.open('file.csv', 'rU', 'utf-16'))

The key was using the codecs module to open the file with the UTF-16 encoding, there are a lot more of encodings, check the documentation.

Python 3 - '_csv.Error: line contains NULL' error byte when reading csv file

csv.reader() (and therefore also csv.DictReader()) simply can't deal with files containing null bytes.

A possible solution would be to replace null bytes when reading the input file, e.g. by using a generator expression, since reader() takes any object supporting the iterator protocol as argument:

with codecs.open(log_path, 'r') as csv_file:
log_reader = csv.DictReader((l.replace('\0', '') for l in csv_file))
for line in log_reader:
if line['Addition Information'] == str
# do something

`_csv.Error: line contains NUL` from a downloaded csv

The file is encoded as UTF-16, so this encoding must be specified when reading the file.

>>> # Check the first 100 characters...
>>> r = requests.get(url)
>>> r.content.decode('utf-16')[:100]
'sep=;\n"Domain Name";"Start Time";"End Time";"Reserve Price";"Domain is IDN";"Domain has hyphen";"Dom'

Depending on your platform, you need to open the file like this:

with open('downloaded_csv.csv', newline='', encoding=encoding) as in_file:

where the value of encoding is one of utf-16, utf-16-le, utf-16-be

Note that you may need to remove or skip the initial "sep=;" line.

How to deal with _csv.Error: line contains NULL byte?

I think your question definitely needs to show a sample of the stream of bytes you expect from csv_file.stream.

I like pushing myself to learn more about Python's approach to IO, encoding/decoding, and CSV, so I've worked this much out for myself, but probably don't expect others to.

import csv
from codecs import iterdecode
import io

# Flask's file.stream is probably BytesIO, see https://stackoverflow.com/a/18246385
# and the Gist in the comment, https://gist.github.com/lost-theory/3772472?permalink_comment_id=1983064#gistcomment-1983064

csv_bytes = b'''\xef\xbb\xbf C1, C2
r1c1, r1c2
r2c1, r2c2, r2c3\x00'''

# This is what Flask is probably giving you
csv_stream = io.BytesIO(csv_bytes)

# Fixed lines is another iterator, `(line.repl...)` vs. `[line.repl...]`
fixed_lines = (line.replace(b'\x00', b'') for line in csv_stream)

decoded_lines = iterdecode(fixed_lines, 'utf-8-sig', errors='strict')

reader = csv.DictReader(decoded_lines, skipinitialspace=True, restkey="INVALID")

for row in reader:
print(row)

and I get:

{'C1': 'r1c1', 'C2': 'r1c2'}
{'C1': 'r2c1', 'C2': 'r2c2', 'INVALID': ['r2c3']}

Python CSV error: line contains NULL byte, but no NULL byte found in the file

I found what the issue was. I was reading the files from an external hard drive formatted in NFTS, while the code was running on a macOS formatted in HFS.

After formatting the external drive to match the formatting on my laptop, the problem of null bytes disappeared.



Related Topics



Leave a reply



Submit