How to deal with multi-level column names downloaded with yfinance
Download all tickers into single dataframe with single level column headers
Option 1
- When downloading single stock ticker data, the returned dataframe column names are a single level, but don't have a ticker column.
- This will download data for each ticker, add a ticker column, and create a single dataframe from all desired tickers.
import yfinance as yf
import pandas as pd
tickerStrings = ['AAPL', 'MSFT']
df_list = list()
for ticker in tickerStrings:
data = yf.download(ticker, group_by="Ticker", period='2d')
data['ticker'] = ticker # add this column because the dataframe doesn't contain a column with the ticker
df_list.append(data)
# combine all dataframes into a single dataframe
df = pd.concat(df_list)
# save to csv
df.to_csv('ticker.csv')
Option 2
- Download all the tickers and unstack the levels
group_by='Ticker'
puts the ticker atlevel=0
of the column name
tickerStrings = ['AAPL', 'MSFT']
df = yf.download(tickerStrings, group_by='Ticker', period='2d')
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
Read yfinance
csv already stored with multi-level column names
- If you wish to keep, and read in a file with a multi-level column index, use the following code, which will return the dataframe to its original form.
df = pd.read_csv('test.csv', header=[0, 1])
df.drop([0], axis=0, inplace=True) # drop this row because it only has one column with Date in it
df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')] = pd.to_datetime(df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')], format='%Y-%m-%d') # convert the first column to a datetime
df.set_index(('Unnamed: 0_level_0', 'Unnamed: 0_level_1'), inplace=True) # set the first column as the index
df.index.name = None # rename the index
- The issue is,
tickerStrings
is a list of tickers, which results in a final dataframe with multi-level column names
AAPL MSFT
Open High Low Close Adj Close Volume Open High Low Close Adj Close Volume
Date
1980-12-12 0.513393 0.515625 0.513393 0.513393 0.405683 117258400 NaN NaN NaN NaN NaN NaN
1980-12-15 0.488839 0.488839 0.486607 0.486607 0.384517 43971200 NaN NaN NaN NaN NaN NaN
1980-12-16 0.453125 0.453125 0.450893 0.450893 0.356296 26432000 NaN NaN NaN NaN NaN NaN
1980-12-17 0.462054 0.464286 0.462054 0.462054 0.365115 21610400 NaN NaN NaN NaN NaN NaN
1980-12-18 0.475446 0.477679 0.475446 0.475446 0.375698 18362400 NaN NaN NaN NaN NaN NaN
- When this is saved to a csv, it looks like the following example, and results in a dataframe like you're having issues with.
,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
Date,,,,,,,,,,,,
1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,
Flatten multi-level columns into a single level and add a ticker column
- If the ticker symbol is
level=0
(top) of the column names- When
group_by='Ticker'
is used
- When
df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
- If the ticker symbol is
level=1
(bottom) of the column names
df.stack(level=1).rename_axis(['Date', 'Ticker']).reset_index(level=1)
Download each ticker and save it to a separate file
- I recommend downloading and saving each ticker individually, which would look something like the following:
import yfinance as yf
import pandas as pd
tickerStrings = ['AAPL', 'MSFT']
for ticker in tickerStrings:
data = yf.download(ticker, group_by="Ticker", period=prd, interval=intv)
data['ticker'] = ticker # add this column because the dataframe doesn't contain a column with the ticker
data.to_csv(f'ticker_{ticker}.csv') # ticker_AAPL.csv for example
data
will look like
Open High Low Close Adj Close Volume ticker
Date
1986-03-13 0.088542 0.101562 0.088542 0.097222 0.062205 1031788800 MSFT
1986-03-14 0.097222 0.102431 0.097222 0.100694 0.064427 308160000 MSFT
1986-03-17 0.100694 0.103299 0.100694 0.102431 0.065537 133171200 MSFT
1986-03-18 0.102431 0.103299 0.098958 0.099826 0.063871 67766400 MSFT
1986-03-19 0.099826 0.100694 0.097222 0.098090 0.062760 47894400 MSFT
- the resulting csv will look like
Date,Open,High,Low,Close,Adj Close,Volume,ticker
1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT
Read in multiple files saved with the previous section and create a single dataframe
import pandas as pd
from pathlib import Path
# set the path to the files
p = Path('c:/path_to_files')
# find the files; this is a generator, not a list
files = p.glob('ticker_*.csv')
# read the files into a dataframe
df = pd.concat([pd.read_csv(file) for file in files])
How to add a column in multilevel Dataframe using pandas and yfinance?
Option 1: Multi-Level Column Names
- Multi-level columns are accessed by passing a tuple
df[('WMB', 'High')]
- Package versions used
print(pd.__version__)
at least'1.0.5'
print(yf.__version__)
is'0.1.54'
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']
df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')
# iterate over level 0 ticker names
for ticker in tickers:
df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
# df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean() # not included becasue n is not defined
# sort the columns
df = df.reindex(sorted(df.columns), axis=1)
# display(df.head())
HD WBA
Adj Close Close High Low Open Volume h-l h-pc l-pc tr Adj Close Close High Low Open Volume h-l h-pc l-pc tr
Datetime
2020-06-08 09:30:00-04:00 253.937500 253.937500 253.960007 252.360001 252.490005 210260.0 1.600006 NaN NaN 1.600006 46.049999 46.049999 46.070000 45.490002 45.490002 239860.0 0.579998 NaN NaN 0.579998
2020-06-08 09:35:00-04:00 253.470001 253.470001 254.339996 253.220093 253.990005 95906.0 1.119904 0.402496 0.717407 1.119904 46.330002 46.330002 46.330002 46.040001 46.070000 104259.0 0.290001 0.280003 0.009998 0.290001
2020-06-08 09:40:00-04:00 253.580002 253.580002 253.829895 252.955002 253.429993 55868.0 0.874893 0.359894 0.514999 0.874893 46.610001 46.610001 46.660000 46.240002 46.330002 113174.0 0.419998 0.329998 0.090000 0.419998
2020-06-08 09:45:00-04:00 253.740005 253.740005 253.929993 253.289993 253.529999 61892.0 0.639999 0.349991 0.290009 0.639999 46.880001 46.880001 46.950001 46.624100 46.624100 121388.0 0.325901 0.340000 0.014099 0.340000
2020-06-08 09:50:00-04:00 253.703400 253.703400 253.910004 253.419998 253.740005 60809.0 0.490005 0.169998 0.320007 0.490005 46.919998 46.919998 46.990002 46.820000 46.880001 154239.0 0.170002 0.110001 0.060001 0.170002
Option 2: Single-Level Column Names
- As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
- With the tickers in a column instead of a multi-level column headers, use
pandas.DataFrame.gropuby
on theTicker
column.
import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta
tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')
# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)
# function with calculations
def my_calculations(df):
df['h-l']=abs(df.High-df.Low)
df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
# df['atr']=df['tr'].rolling(window=n, min_periods=n).mean() # n is not defined in the question
return df
# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])
Can yfinance download 2 columns only?
df = yf.Ticker("IBM").history(start="2017-01-01", end="2017-04-30", frequency='1dy')['Close']
Returns only one price
Output:
Date
2017-01-03 125.567116
2017-01-04 127.121788
2017-01-05 126.701187
2017-01-06 127.324532
2017-01-09 125.912628
...
2017-04-24 121.685005
2017-04-25 121.412483
2017-04-26 121.162666
2017-04-27 121.359474
2017-04-28 121.336777
Also, if you request all the data, you can try to delete the unnecessary immediately
data = yf.download("AAPL", start="2017-01-01", end="2017-04-30", auto_adjust=True)
data = data[['Close']]
print(data)
Not able to get data, downloaded in group by format using package yfinance, into pandas dataframe only for specific tickers in my sub list
As answered here: pandas dataframe select columns in multiindex
mydf.iloc[:, mydf.columns.get_level_values(1)=='Close']
@EDIT: remove useless column level
mydf.columns = mydf.columns.droplevel(1)
How to organise multiple stock data in pandas dataframe for plotting
It maybe better if you transform the data as shown in example belo.
df = pd.DataFrame({
'stock': ['A', 'B'],
'price': [[10,20,30,40], [1,2,3,4]],
'time': [[1,2,3,4], [1,2,3,4]]
})
df = df.set_index(['stock']).apply(pd.Series.explode).reset_index()
df
stock | price | time |
---|---|---|
A | 10 | 1 |
A | 20 | 2 |
A | 30 | 3 |
A | 40 | 4 |
B | 1 | 1 |
B | 2 | 2 |
B | 3 | 3 |
B | 4 | 4 |
How to import csv with multi level column index
Try the following changes-
stocks_sample = stocks_sample.reset_index() #will create the multi index as separate columns
stocks_sample.to_csv('stocks_sample.csv')
#while importing
stocks_sample_csv = pd.read_csv('stocks_sample.csv')
stocks_sample_csv.set_index(['index0', 'index1']) #change the list to the list of columns which contain the previous indexes.
downloading yahoofinance data for date ranges
You can use iterrows
:
data_stocks = {}
for _, row in data_shortened.iterrows():
placeholder = yf.download(row['symbol'], row['date_start'], row['date_end'])
data_stocks[row['symbol']] = placeholder
stocks = pd.concat(data_stocks)
Output:
>>> stocks
Open High Low Close Adj Close Volume
Date
USAU 2020-02-27 7.1 7.5 6.5 6.9 6.9 23460.0
2020-02-28 6.5 6.8 6.0 6.3 6.3 22010.0
2020-03-02 6.5 6.6 6.0 6.1 6.1 7300.0
2020-03-03 6.4 6.5 6.0 6.5 6.5 9700.0
2020-03-04 6.5 7.0 6.1 7.0 7.0 13340.0
... ... ... ... ... ... ...
DTYS 2020-04-20 120.0 120.0 120.0 120.0 120.0 0.0
2020-04-21 120.0 120.0 120.0 120.0 120.0 0.0
2020-04-22 120.0 120.0 120.0 120.0 120.0 0.0
2020-04-23 120.0 120.0 120.0 120.0 120.0 0.0
2020-04-24 120.0 120.0 120.0 120.0 120.0 0.0
[358 rows x 6 columns]
Iterate though list loop error help - Yfinance
If you use for
-loop then using normal assigment a = ...
you can get only last value.
It is standard rule: if you use for
-loop then you have to use list
to get all results.
BTW: You could run Ticker(t)
only once in loop - maybe it will run faster.
import pandas as pd
import yfinance as yf
tickers = ['AAPL', 'GOOGL', 'FB']
column_a = [] # <-- list for all results
column_b = [] # <-- list for all results
column_c = [] # <-- list for all results
for t in tickers:
print('---', t, '---')
all_info = yf.Ticker(t).info
a = all_info.get('ebitda', 'NaN')
b = all_info.get('enterpriseValue', 'NaN')
c = all_info.get('totalAssets', 'NaN')
column_a.append(a) # <-- list for all results
column_b.append(b) # <-- list for all results
column_c.append(c) # <-- list for all results
#pd.set_option("display.max_rows", None)
df = pd.DataFrame({
'Tickets': tickers,
'Ebitda': column_a, # <-- list for all results
'EnterpriseValue' :column_b, # <-- list for all results
'TotalAssets': column_c, # <-- list for all results
})
print(df)
EDIT:
Eventually you could append directly to DataFrame
First you have to create empty DataFrame
but with all columns.
Next you can append dictionary with ignore_index=True
. And in DataFrame
you have to assign it again to DataFrame
- df = df.append(...)
import pandas as pd
import yfinance as yf
tickers = ['AAPL', 'GOOGL', 'FB']
df = pd.DataFrame(columns=['Tickets', 'Ebitda', 'EnterpriseValue', 'TotalAssets'])
for t in tickers:
print('---', t, '---')
all_info = yf.Ticker(t).info
a = all_info.get('ebitda', 'NaN')
b = all_info.get('enterpriseValue', 'NaN')
c = all_info.get('totalAssets', 'NaN')
df = df.append({
'Tickets': t,
'Ebitda': a,
'EnterpriseValue': b,
'TotalAssets': c,
}, ignore_index=True)
#pd.set_option("display.max_rows", None)
print(df)
Related Topics
Limit Number of Threads in Numpy
How to Recursively Find Specific Key in Nested JSON
How to Have Assignment in a Condition
Passing Table Name as a Parameter in Psycopg2
Problem with Multi Threaded Python App and Socket Connections
Repeat Rows in a Pandas Dataframe Based on Column Value
Get a Function Argument's Default Value
Pycharm Doesn't Recognise Installed Module
Splitting a List Based on a Delimiter Word
Python: Urlerror: <Urlopen Error [Errno 10060]
Can a Lambda Function Call Itself Recursively in Python
Difference Between .String and .Text Beautifulsoup
Remove Quotes from String in Python
Prepend a Line to an Existing File in Python
Dangers of Sys.Setdefaultencoding('Utf-8')
How to Install MySQLdb (Python Data Access Library to MySQL) on MAC Os X