Reading a UTF8 CSV file with Python
The .encode
method gets applied to a Unicode string to make a byte-string; but you're calling it on a byte-string instead... the wrong way 'round! Look at the codecs
module in the standard library and codecs.open
in particular for better general solutions for reading UTF-8 encoded text files. However, for the csv
module in particular, you need to pass in utf-8 data, and that's what you're already getting, so your code can be much simpler:
import csv
def unicode_csv_reader(utf8_data, dialect=csv.excel, **kwargs):
csv_reader = csv.reader(utf8_data, dialect=dialect, **kwargs)
for row in csv_reader:
yield [unicode(cell, 'utf-8') for cell in row]
filename = 'da.csv'
reader = unicode_csv_reader(open(filename))
for field1, field2, field3 in reader:
print field1, field2, field3
PS: if it turns out that your input data is NOT in utf-8, but e.g. in ISO-8859-1, then you do need a "transcoding" (if you're keen on using utf-8 at the csv
module level), of the form line.decode('whateverweirdcodec').encode('utf-8')
-- but probably you can just use the name of your existing encoding in the yield
line in my code above, instead of 'utf-8'
, as csv
is actually going to be just fine with ISO-8859-* encoded bytestrings.
read utf-8 CSV file into dataframe
I fixed it thanks to the post at this question
'utf-8' codec can't decode byte 0x92 in position 18: invalid start byte
I thought I would try the fix that they suggested
df = pd.read_csv('myfile.csv', encoding='cp1252')
and it worked! It's Windows codepage 1252... not utf-8
Opening a CSV explicitly saved as UTF-8 still shows its encoding as cp1252
try this:
with open(filename, encoding="utf8") as f:
print(f)
Open csv file in utf-8 with Python
You can try using pandas.
import pandas
myfile = open('myfile.csv')
data = pandas.read_csv(myfile, encoding='utf-8', quotechar='"', delimiter=';')
print(data.values)
or unicodecsv
import unicodecsv
myfile = open('myfile.csv')
data = unicodecsv.reader(myfile, encoding='utf-8', delimiter=';')
for row in data:
print row
You may be able to install them using pip:
pip install pandas
pip install unicodecsv
Depending on your needs you may also try simple string operations:
data = [line.strip().split(';') for i, line in enumerate(open('./foo.csv').readlines()) if i != 0]
Update
You can also try replacing unicode characters with ASCII equivalents:
from StringIO import StringIO
import codecs
import unicodedata
...
try:
self.FichierE = StringIO(
unicodedata.normalize(
'NFKD', codecs.open(self.CheminFichierE, "r", "utf-8").read()
).encode('ascii', 'ignore'))
self.ReaderFichierE = csv.reader(self.FichierE, delimiter=';')
except IOError:
self.TextCtrl.AppendText(u"Fichier E n'a pas été trouvé")
return
try:
DataFichierE = [ligne for ligne in self.ReaderFichierE]
except UnicodeDecodeError:
self.TextCtrl.AppendText(self.NomFichierE+ u" n'est pas lisible")
return
except UnicodeEncodeError:
self.TextCtrl.AppendText(self.NomFichierE+ u" n'est pas lisible (ASCII)")
return
Trouble with UTF-8 CSV input in Python
Your first snippet won't work. You are feeding unicode data to the csv reader, which (as documented) can't handle it.
Your 2nd and 3rd snippets are confused. Something like the following is all that you need:
f = open('your_utf8_encoded_file.csv', 'rb')
reader = csv.reader(f)
for utf8_row in reader:
unicode_row = [x.decode('utf8') for x in utf8_row]
print unicode_row
Related Topics
Python Subprocess Get Children's Output to File and Terminal
How to Calculate the Date Six Months from the Current Date Using the Datetime Python Module
How to Use Stringio in Python3
Pandas: Filter Rows of Dataframe with Operator Chaining
Python Socket Not Receiving Without Sending
What Does a B Prefix Before a Python String Mean
Opencv 2.4 Videocapture Not Working on Windows
How to Extract a Single Value from a JSON Response
Get Ip Address of Visitors Using Flask for Python
Shooting a Bullet in Pygame in the Direction of Mouse
How to Add an Extra Column to a Numpy Array
What Is the Purpose of the -M Switch
How to Convert SQLalchemy Row Object to a Python Dict
How to Install Packages Using Pip According to the Requirements.Txt File from a Local Directory
How to Get the Original Variable Name of Variable Passed to a Function