Pandas read csv column values as list
You can try using pickle
Ex:
import pandas as pd
df = pd.DataFrame({"Col": [[1,2,3], [4,5,6]]})
df.to_pickle(filename)
#Read the pickle file
df = pd.read_pickle(filename)
print(df["Col"])
print(df["Col"][0][0])
Output:
0 [1, 2, 3]
1 [4, 5, 6]
Name: Col, dtype: object
1
MoreInfo
Pandas read_csv dtype read all columns but few as string
For Pandas 1.5.0+, there's an easy way to do this. If you use a defaultdict
instead of a normal dict
for the dtype
argument, any columns which aren't explicitly listed in the dictionary will use the default as their type. E.g.
from collections import defaultdict
types = defaultdict(str, A="int", B="float")
df = pd.read_csv("/path/to/file.csv", dtype=types, keep_default_na=False)
(I haven't tested this, but I assume you still need keep_default_na=False
)
For older versions of Pandas:
You can read the entire csv as strings then convert your desired columns to other types afterwards like this:
df = pd.read_csv('/path/to/file.csv', dtype=str, keep_default_na=False)
# example df; yours will be from pd.read_csv() above
df = pd.DataFrame({'A': ['1', '3', '5'], 'B': ['2', '4', '6'], 'C': ['x', 'y', 'z']})
types_dict = {'A': int, 'B': float}
for col, col_type in types_dict.items():
df[col] = df[col].astype(col_type)
keep_default_na=False
is necessary if some of the columns are empty strings or something like NA
which pandas convert to NA
of type float
by default, which would make you end up with a mixed datatype of str
/float
Another approach, if you really want to specify the proper types for all columns when reading the file in and not change them after: read in just the column names (no rows), then use those to fill in which columns should be strings
col_names = pd.read_csv('file.csv', nrows=0).columns
types_dict = {'A': int, 'B': float}
types_dict.update({col: str for col in col_names if col not in types_dict})
pd.read_csv('file.csv', dtype=types_dict)
Reading csv containing a list in Pandas
One option is to use ast.literal_eval
as converter:
>>> import ast
>>> df = pd.read_clipboard(header=None, quotechar='"', sep=',',
... converters={1:ast.literal_eval})
>>> df
0 1
0 HK [5328.1, 5329.3, 2013-12-27 13:58:57.973614]
1 HK [5328.1, 5329.3, 2013-12-27 13:58:59.237387]
2 HK [5328.1, 5329.3, 2013-12-27 13:59:00.346325]
And convert those lists to a DataFrame if needed, for example with:
>>> df = pd.DataFrame.from_records(df[1].tolist(), index=df[0],
... columns=list('ABC')).reset_index()
>>> df['C'] = pd.to_datetime(df['C'])
>>> df
0 A B C
0 HK 5328.1 5329.3 2013-12-27 13:58:57.973614
1 HK 5328.1 5329.3 2013-12-27 13:58:59.237387
2 HK 5328.1 5329.3 2013-12-27 13:59:00.346325
Pandas reading csv as string type
Update: this has been fixed: from 0.11.1 you passing str
/np.str
will be equivalent to using object
.
Use the object dtype:
In [11]: pd.read_csv('a', dtype=object, index_col=0)
Out[11]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236
or better yet, just don't specify a dtype:
In [12]: pd.read_csv('a', index_col=0)
Out[12]:
A B
1A 0.356331 0.745585
1B 0.200374 0.013922
but bypassing the type sniffer and truly returning only strings requires a hacky use of converters
:
In [13]: pd.read_csv('a', converters={i: str for i in range(100)})
Out[13]:
A B
1A 0.35633069074776547 0.745585398803751
1B 0.20037376323337375 0.013921830784260236
where 100
is some number equal or greater than your total number of columns.
It's best to avoid the str dtype, see for example here.
How to drop a specific column of csv file while reading it using pandas?
If you know the column names prior, you can do it by setting usecols
parameter
When you know which columns to use
Suppose you have csv file with columns ['id','name','last_name']
and you want just ['name','last_name']
. You can do it as below:
import pandas as pd
df = pd.read_csv("sample.csv", usecols = ['name','last_name'])
when you want first N columns
If you don't know the column names but you want first N columns from dataframe. You can do it by
import pandas as pd
df = pd.read_csv("sample.csv", usecols = [i for i in range(n)])
Edit
When you know name of the column to be dropped
# Read column names from file
cols = list(pd.read_csv("sample_data.csv", nrows =1))
print(cols)
# Use list comprehension to remove the unwanted column in **usecol**
df= pd.read_csv("sample_data.csv", usecols =[i for i in cols if i != 'name'])
Setting column types while reading csv with pandas
In your loop you are doing:
for col in dp.columns:
print 'column', col,':', type(col[0])
and you are correctly seeing str
as the output everywhere because col[0]
is the first letter of the name of the column, which is a string.
For example, if you run this loop:
for col in dp.columns:
print 'column', col,':', col[0]
you will see the first letter of the string of each column name is printed out - this is what col[0]
is.
Your loop only iterates on the column names, not on the series data.
What you really want is to check the type of each column's data (not its header or part of its header) in a loop.
So do this instead to get the types of the column data (non-header data):
for col in dp.columns:
print 'column', col,':', type(dp[col][0])
This is similar to what you did when printing the type of the rating
column separately.
Related Topics
How to Interact with the Recaptcha Audio Element Using Selenium and Python
How to Enumerate a Range of Numbers Starting at 1
Serving Dynamically Generated Zip Archives in Django
Find the Indexes of All Regex Matches
Sqlalchemy Unique Across Multiple Columns
Appending List But Error 'Nonetype' Object Has No Attribute 'Append'
How to Use If/Else in a Dictionary Comprehension
Get Column Index from Column Name in Python Pandas
Python/Selenium Incognito/Private Mode
Multiprocessing - Pipe VS Queue
How to Get Md5 Sum of a String Using Python
Python Create Unix Timestamp Five Minutes in the Future
Monkey Patching a Class in Another Module in Python
How to Get Char from String by Index
Pythonic Way to Combine For-Loop and If-Statement