Import CSV file as a Pandas DataFrame
pandas.read_csv
to the rescue:
import pandas as pd
df = pd.read_csv("data.csv")
print(df)
This outputs a pandas DataFrame
:
Date price factor_1 factor_2
0 2012-06-11 1600.20 1.255 1.548
1 2012-06-12 1610.02 1.258 1.554
2 2012-06-13 1618.07 1.249 1.552
3 2012-06-14 1624.40 1.253 1.556
4 2012-06-15 1626.15 1.258 1.552
5 2012-06-16 1626.15 1.263 1.558
6 2012-06-17 1626.15 1.264 1.572
How to import csv as a pandas dataframe?
I think this is a path problem rather than a pandas issue. Try opening the same file with the built-in open()
function. To get the correct path to navigate over to the directory containing the csv file and write pwd
in the terminal (for macOS). Copy this path and just append the <filename>.csv
Possible Solutions
Move the
file.csv
to the same folder as the python script or Jupyter Notebook and then simply usepd.read_csv("file.csv", sep = ";")
.The URL which you shared redirects to a page but doesn't download the csv file directly. If you have the file available in s3 or gs, try using that link.
How to read csv file into dataframe using pandas
Use read_csv
with header=None
first:
import pandas as pd
temp=u"""Name=John,Gender=M,BloodType=A,Location=New York,Age=18
Name=Mary,Gender=F,BloodType=AB,Location=Seatle,Age=30"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0 1 2 3 4
0 Name=John Gender=M BloodType=A Location=New York Age=18
1 Name=Mary Gender=F BloodType=AB Location=Seatle Age=30
Then DataFrame.apply
with Series.str.split
and select second lists, last change columns names:
df1 = df.apply(lambda x: x.str.split('=').str[1])
df1.columns = df.iloc[0].str.split('=').str[0].rename(None)
#if necessary
df1['Age'] = df1['Age'].astype(int)
print (df1)
Name Gender BloodType Location Age
0 John M A New York 18
1 Mary F AB Seatle 30
Creating list from imported CSV file with pandas
So for this you can just add the following line after the code you've posted:
company_name = df[companies_column].tolist()
This will get the column data in the companies column as pandas Series (essentially a Series is just a fancy list) and then convert it to a regular python list.
Or, if you were to start from scratch, you can also just use these two lines
import pandas as pd
df = pd.read_csv('Downloads\Dropped_Companies.csv')
company_name = df[df.columns[4]].tolist()Another option: If this is the only thing you need to do with your csv file, you can also get away just using the csv library that comes with python instead of installing pandas, using this approach.
If you want to learn more about how to get data out of your pandas DataFrame (the df
variable in your code), you might find this blog post helpful.
pandas adding .0 when I import from CSV
You have hit the worst pandas wart of all times. But it's 2022, and missing values for integers are finally supported! Check this out. Here is a csv file, with integer column a
that has a missing value:
a,b
1,y
2,m
,c
3,a
If you read it in a default manner you get the annoying conversion to float:
pd.read_csv('test.csv'):
a b
--------------
0 1.0 y
1 2.0 m
2 NaN c
3 3.0 a
But, if you tell pandas that you want new experimental integers with missing values, you get the good stuff:pd.read_csv('test.csv', dtype={'a': 'Int64'}):
a b
---------
0 1 y
1 2 m
2 <NA> c
3 3 a
Create SAS Data Step to import csv from pandas dataframe in python
This doesn't answer the question of why the loop wasn't printing in the one instance, but it is a much better way to do what I was originally trying to do anyway. Thanks @Tom for the guidance.
from pandas.api.types import is_datetime64_any_dtype as is_datetime, is_object_dtype as is_object
def sas_import_csv(df,sas_date_fmt='yymmddn8.',filePath='',outName = 'X'):
'''Takes a dataframe and prepares a data step to import the csv file to SAS.
'''
value_fmts = [np.float,np.int32,np.int64]
opening = f"%let infile = '{filePath}';\ndata {outName}; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ \ninfile &infile delimiter = ',' MISSOVER DSD TRUNCOVER lrecl=32767 firstobs=2 ;"
inp = 'input '
fmt = 'format '
infmt = 'informat '
closing = "if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */\nrun;"
measurer = np.vectorize(len)
dfLen = measurer(df.values.astype(str)).max(axis=0)
for l,col in zip(dfLen,df.columns):
if is_object(df[col]): inp = inp + f'{col} :${l}. '
elif is_datetime(df[col]):
inp = inp + f'{col} '
fmt = fmt + f'{col} {sas_date_fmt} '
infmt = infmt + f'{col} yymmdd10. '
else: inp = inp + f'{col} '
return f'{opening} {inp} ;\n{fmt} ;\n{infmt} ;\n{closing}'
Now you can read the dataframe into SAS by simply copying and pasting the output from print(c)
after running the below code:
import pandas as pd
dates = pd.date_range(start='1/1/2018', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df['E'] = "some string"
df = df.reset_index().rename(columns = {'index':'Date'})
f = r'C:\\Users\\user\\example.csv'
c = sas_import_csv(df,filePath=f)
df.to_csv(f,index=False)
print(c)
Related Topics
How to Execute a Program from Python? Os.System Fails Due to Spaces in Path
"For Line In..." Results in Unicodedecodeerror: 'Utf-8' Codec Can't Decode Byte
Cannot Find Module Cv2 When Using Opencv
From List of Integers, Get Number Closest to a Given Value
Drf: Simple Foreign Key Assignment with Nested Serializers
Importing a CSV File into a SQLite3 Database Table Using Python
Threading.Timer - Repeat Function Every 'N' Seconds
Output to the Same Line Overwriting Previous Output
Can't Set Attributes on Instance of "Object" Class
How to Get Multiline Input from the User
Open Document with Default Os Application in Python, Both in Windows and MAC Os
How to Locate Element Using Selenium Chrome Webdriver in Python Selenium
Dump a Numpy Array into a CSV File
Is There a Simple Way to Delete a List Element by Value