Import CSV File as a Pandas Dataframe

Import CSV file as a Pandas DataFrame

pandas.read_csv to the rescue:

import pandas as pd
df = pd.read_csv("data.csv")
print(df)

This outputs a pandas DataFrame:

        Date    price  factor_1  factor_2
0 2012-06-11 1600.20 1.255 1.548
1 2012-06-12 1610.02 1.258 1.554
2 2012-06-13 1618.07 1.249 1.552
3 2012-06-14 1624.40 1.253 1.556
4 2012-06-15 1626.15 1.258 1.552
5 2012-06-16 1626.15 1.263 1.558
6 2012-06-17 1626.15 1.264 1.572

How to import csv as a pandas dataframe?

I think this is a path problem rather than a pandas issue. Try opening the same file with the built-in open() function. To get the correct path to navigate over to the directory containing the csv file and write pwd in the terminal (for macOS). Copy this path and just append the <filename>.csv

Possible Solutions



  1. Move the file.csv to the same folder as the python script or Jupyter Notebook and then simply use pd.read_csv("file.csv", sep = ";").

  2. The URL which you shared redirects to a page but doesn't download the csv file directly. If you have the file available in s3 or gs, try using that link.

How to read csv file into dataframe using pandas

Use read_csv with header=None first:

import pandas as pd

temp=u"""Name=John,Gender=M,BloodType=A,Location=New York,Age=18
Name=Mary,Gender=F,BloodType=AB,Location=Seatle,Age=30"""
#after testing replace 'pd.compat.StringIO(temp)' to 'filename.csv'
df = pd.read_csv(pd.compat.StringIO(temp), header=None)
print (df)
0 1 2 3 4
0 Name=John Gender=M BloodType=A Location=New York Age=18
1 Name=Mary Gender=F BloodType=AB Location=Seatle Age=30

Then DataFrame.apply with Series.str.split and select second lists, last change columns names:

df1 = df.apply(lambda x: x.str.split('=').str[1])
df1.columns = df.iloc[0].str.split('=').str[0].rename(None)
#if necessary
df1['Age'] = df1['Age'].astype(int)
print (df1)
Name Gender BloodType Location Age
0 John M A New York 18
1 Mary F AB Seatle 30

Creating list from imported CSV file with pandas

  1. So for this you can just add the following line after the code you've posted:

    company_name = df[companies_column].tolist()

    This will get the column data in the companies column as pandas Series (essentially a Series is just a fancy list) and then convert it to a regular python list.

  2. Or, if you were to start from scratch, you can also just use these two lines

    import pandas as pd

    df = pd.read_csv('Downloads\Dropped_Companies.csv')
    company_name = df[df.columns[4]].tolist()
  3. Another option: If this is the only thing you need to do with your csv file, you can also get away just using the csv library that comes with python instead of installing pandas, using this approach.

If you want to learn more about how to get data out of your pandas DataFrame (the df variable in your code), you might find this blog post helpful.

pandas adding .0 when I import from CSV

You have hit the worst pandas wart of all times. But it's 2022, and missing values for integers are finally supported! Check this out. Here is a csv file, with integer column a that has a missing value:

a,b
1,y
2,m
,c
3,a

If you read it in a default manner you get the annoying conversion to float:

pd.read_csv('test.csv'):

    a       b
--------------
0 1.0 y
1 2.0 m
2 NaN c
3 3.0 a

But, if you tell pandas that you want new experimental integers with missing values, you get the good stuff:
pd.read_csv('test.csv', dtype={'a': 'Int64'}):

    a   b
---------
0 1 y
1 2 m
2 <NA> c
3 3 a

Create SAS Data Step to import csv from pandas dataframe in python

This doesn't answer the question of why the loop wasn't printing in the one instance, but it is a much better way to do what I was originally trying to do anyway. Thanks @Tom for the guidance.

from pandas.api.types import is_datetime64_any_dtype as is_datetime, is_object_dtype as is_object

def sas_import_csv(df,sas_date_fmt='yymmddn8.',filePath='',outName = 'X'):
'''Takes a dataframe and prepares a data step to import the csv file to SAS.
'''
value_fmts = [np.float,np.int32,np.int64]
opening = f"%let infile = '{filePath}';\ndata {outName}; %let _EFIERR_ = 0; /* set the ERROR detection macro variable */ \ninfile &infile delimiter = ',' MISSOVER DSD TRUNCOVER lrecl=32767 firstobs=2 ;"
inp = 'input '
fmt = 'format '
infmt = 'informat '
closing = "if _ERROR_ then call symputx('_EFIERR_',1); /* set ERROR detection macro variable */\nrun;"
measurer = np.vectorize(len)
dfLen = measurer(df.values.astype(str)).max(axis=0)
for l,col in zip(dfLen,df.columns):
if is_object(df[col]): inp = inp + f'{col} :${l}. '
elif is_datetime(df[col]):
inp = inp + f'{col} '
fmt = fmt + f'{col} {sas_date_fmt} '
infmt = infmt + f'{col} yymmdd10. '
else: inp = inp + f'{col} '
return f'{opening} {inp} ;\n{fmt} ;\n{infmt} ;\n{closing}'

Now you can read the dataframe into SAS by simply copying and pasting the output from print(c) after running the below code:

import pandas as pd
dates = pd.date_range(start='1/1/2018', periods=6)
df = pd.DataFrame(np.random.randn(6, 4), index=dates, columns=list("ABCD"))
df['E'] = "some string"
df = df.reset_index().rename(columns = {'index':'Date'})
f = r'C:\\Users\\user\\example.csv'
c = sas_import_csv(df,filePath=f)
df.to_csv(f,index=False)
print(c)


Related Topics



Leave a reply



Submit