How to Deal with Multi-Level Column Names Downloaded with Yfinance

How to deal with multi-level column names downloaded with yfinance

Download all tickers into single dataframe with single level column headers

Option 1

When downloading single stock ticker data, the returned dataframe column names are a single level, but don't have a ticker column.
This will download data for each ticker, add a ticker column, and create a single dataframe from all desired tickers.

import yfinance as yf
import pandas as pd

tickerStrings = ['AAPL', 'MSFT']
df_list = list()
for ticker in tickerStrings:
    data = yf.download(ticker, group_by="Ticker", period='2d')
    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
    df_list.append(data)

# combine all dataframes into a single dataframe
df = pd.concat(df_list)

# save to csv
df.to_csv('ticker.csv')

Option 2

Download all the tickers and unstack the levels
- group_by='Ticker' puts the ticker at level=0 of the column name

tickerStrings = ['AAPL', 'MSFT']
df = yf.download(tickerStrings, group_by='Ticker', period='2d')
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

Read `yfinance` csv already stored with multi-level column names

If you wish to keep, and read in a file with a multi-level column index, use the following code, which will return the dataframe to its original form.

df = pd.read_csv('test.csv', header=[0, 1])
df.drop([0], axis=0, inplace=True)  # drop this row because it only has one column with Date in it
df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')] = pd.to_datetime(df[('Unnamed: 0_level_0', 'Unnamed: 0_level_1')], format='%Y-%m-%d')  # convert the first column to a datetime
df.set_index(('Unnamed: 0_level_0', 'Unnamed: 0_level_1'), inplace=True)  # set the first column as the index
df.index.name = None  # rename the index

The issue is, tickerStrings is a list of tickers, which results in a final dataframe with multi-level column names

                AAPL                                                    MSFT                                
                Open      High       Low     Close Adj Close     Volume Open High Low Close Adj Close Volume
Date                                                                                                        
1980-12-12  0.513393  0.515625  0.513393  0.513393  0.405683  117258400  NaN  NaN NaN   NaN       NaN    NaN
1980-12-15  0.488839  0.488839  0.486607  0.486607  0.384517   43971200  NaN  NaN NaN   NaN       NaN    NaN
1980-12-16  0.453125  0.453125  0.450893  0.450893  0.356296   26432000  NaN  NaN NaN   NaN       NaN    NaN
1980-12-17  0.462054  0.464286  0.462054  0.462054  0.365115   21610400  NaN  NaN NaN   NaN       NaN    NaN
1980-12-18  0.475446  0.477679  0.475446  0.475446  0.375698   18362400  NaN  NaN NaN   NaN       NaN    NaN

When this is saved to a csv, it looks like the following example, and results in a dataframe like you're having issues with.

,AAPL,AAPL,AAPL,AAPL,AAPL,AAPL,MSFT,MSFT,MSFT,MSFT,MSFT,MSFT
,Open,High,Low,Close,Adj Close,Volume,Open,High,Low,Close,Adj Close,Volume
Date,,,,,,,,,,,,
1980-12-12,0.5133928656578064,0.515625,0.5133928656578064,0.5133928656578064,0.40568336844444275,117258400,,,,,,
1980-12-15,0.4888392984867096,0.4888392984867096,0.4866071343421936,0.4866071343421936,0.3845173120498657,43971200,,,,,,
1980-12-16,0.453125,0.453125,0.4508928656578064,0.4508928656578064,0.3562958240509033,26432000,,,,,,

Flatten multi-level columns into a single level and add a ticker column

If the ticker symbol is level=0 (top) of the column names
- When group_by='Ticker' is used

df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

If the ticker symbol is level=1 (bottom) of the column names

df.stack(level=1).rename_axis(['Date', 'Ticker']).reset_index(level=1)

Download each ticker and save it to a separate file

I recommend downloading and saving each ticker individually, which would look something like the following:

import yfinance as yf
import pandas as pd

tickerStrings = ['AAPL', 'MSFT']
for ticker in tickerStrings:
    data = yf.download(ticker, group_by="Ticker", period=prd, interval=intv)
    data['ticker'] = ticker  # add this column because the dataframe doesn't contain a column with the ticker
    data.to_csv(f'ticker_{ticker}.csv')  # ticker_AAPL.csv for example

data will look like

                Open      High       Low     Close  Adj Close      Volume ticker
Date                                                                            
1986-03-13  0.088542  0.101562  0.088542  0.097222   0.062205  1031788800   MSFT
1986-03-14  0.097222  0.102431  0.097222  0.100694   0.064427   308160000   MSFT
1986-03-17  0.100694  0.103299  0.100694  0.102431   0.065537   133171200   MSFT
1986-03-18  0.102431  0.103299  0.098958  0.099826   0.063871    67766400   MSFT
1986-03-19  0.099826  0.100694  0.097222  0.098090   0.062760    47894400   MSFT

the resulting csv will look like

Date,Open,High,Low,Close,Adj Close,Volume,ticker
1986-03-13,0.0885416641831398,0.1015625,0.0885416641831398,0.0972222238779068,0.0622050017118454,1031788800,MSFT
1986-03-14,0.0972222238779068,0.1024305522441864,0.0972222238779068,0.1006944477558136,0.06442664563655853,308160000,MSFT
1986-03-17,0.1006944477558136,0.1032986119389534,0.1006944477558136,0.1024305522441864,0.0655374601483345,133171200,MSFT
1986-03-18,0.1024305522441864,0.1032986119389534,0.0989583358168602,0.0998263880610466,0.06387123465538025,67766400,MSFT
1986-03-19,0.0998263880610466,0.1006944477558136,0.0972222238779068,0.0980902761220932,0.06276042759418488,47894400,MSFT

Read in multiple files saved with the previous section and create a single dataframe

import pandas as pd
from pathlib import Path

# set the path to the files
p = Path('c:/path_to_files')

# find the files; this is a generator, not a list
files = p.glob('ticker_*.csv')

# read the files into a dataframe
df = pd.concat([pd.read_csv(file) for file in files])

How to add a column in multilevel Dataframe using pandas and yfinance?

Option 1: Multi-Level Column Names

Multi-level columns are accessed by passing a tuple
- df[('WMB', 'High')]
Package versions used
- print(pd.__version__) at least '1.0.5'
- print(yf.__version__) is '0.1.54'

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

end = datetime.today()
start = end - timedelta(59)
tickers = ['WBA', 'HD']

df = yf.download(tickers,group_by=tickers,start=start,end=end,interval='5m')

# iterate over level 0 ticker names
for ticker in tickers:
    df[(ticker, 'h-l')] = abs(df[(ticker, 'High')] - df[(ticker, 'Low')])
    df[(ticker, 'h-pc')] = abs(df[(ticker, 'High')] - df[(ticker, 'Adj Close')].shift(1))
    df[(ticker, 'l-pc')] = abs(df[(ticker, 'Low')] - df[(ticker, 'Adj Close')].shift(1))
    df[(ticker, 'tr')] = df[[(ticker, 'h-l'), (ticker, 'h-pc'), (ticker, 'l-pc')]].max(axis=1)
#     df[(ticker, 'atr')] = df[(ticker, 'tr')].rolling(window=n, min_periods=n).mean()  # not included becasue n is not defined

# sort the columns
df = df.reindex(sorted(df.columns), axis=1)

# display(df.head())
                                   HD                                                                                                          WBA                                                                                              
                            Adj Close       Close        High         Low        Open    Volume       h-l      h-pc      l-pc        tr  Adj Close      Close       High        Low       Open    Volume       h-l      h-pc      l-pc        tr
Datetime                                                                                                                                                                                                                                        
2020-06-08 09:30:00-04:00  253.937500  253.937500  253.960007  252.360001  252.490005  210260.0  1.600006       NaN       NaN  1.600006  46.049999  46.049999  46.070000  45.490002  45.490002  239860.0  0.579998       NaN       NaN  0.579998
2020-06-08 09:35:00-04:00  253.470001  253.470001  254.339996  253.220093  253.990005   95906.0  1.119904  0.402496  0.717407  1.119904  46.330002  46.330002  46.330002  46.040001  46.070000  104259.0  0.290001  0.280003  0.009998  0.290001
2020-06-08 09:40:00-04:00  253.580002  253.580002  253.829895  252.955002  253.429993   55868.0  0.874893  0.359894  0.514999  0.874893  46.610001  46.610001  46.660000  46.240002  46.330002  113174.0  0.419998  0.329998  0.090000  0.419998
2020-06-08 09:45:00-04:00  253.740005  253.740005  253.929993  253.289993  253.529999   61892.0  0.639999  0.349991  0.290009  0.639999  46.880001  46.880001  46.950001  46.624100  46.624100  121388.0  0.325901  0.340000  0.014099  0.340000
2020-06-08 09:50:00-04:00  253.703400  253.703400  253.910004  253.419998  253.740005   60809.0  0.490005  0.169998  0.320007  0.490005  46.919998  46.919998  46.990002  46.820000  46.880001  154239.0  0.170002  0.110001  0.060001  0.170002

Option 2: Single-Level Column Names

As demonstrated in How to deal with multi-level column names downloaded with yfinance?, it's easier to deal with single-level column names.
With the tickers in a column instead of a multi-level column headers, use pandas.DataFrame.gropuby on the Ticker column.

import yfinance as yf
import pandas as pd
from datetime import datetime, timedelta

tickerStrings = ['WBA', 'HD']
df = yf.download(tickers, group_by='Ticker', start=start ,end=end, interval='5m')

# create single level column names
df = df.stack(level=0).rename_axis(['Date', 'Ticker']).reset_index(level=1)

# function with calculations
def my_calculations(df):
    df['h-l']=abs(df.High-df.Low)
    df['h-pc']=abs(df.High-df['Adj Close'].shift(1))
    df['l-pc']=abs(df.Low-df['Adj Close'].shift(1))
    df['tr']=df[['h-l','h-pc','l-pc']].max(axis=1)
#     df['atr']=df['tr'].rolling(window=n, min_periods=n).mean()  # n is not defined in the question
    return df

# apply the function
df_updated = df.reset_index().groupby('Ticker').apply(my_calculations).sort_values(['Ticker', 'Date'])

Can yfinance download 2 columns only?

df = yf.Ticker("IBM").history(start="2017-01-01", end="2017-04-30", frequency='1dy')['Close']

Returns only one price

Output:

Date
2017-01-03    125.567116
2017-01-04    127.121788
2017-01-05    126.701187
2017-01-06    127.324532
2017-01-09    125.912628
                 ...    
2017-04-24    121.685005
2017-04-25    121.412483
2017-04-26    121.162666
2017-04-27    121.359474
2017-04-28    121.336777

Also, if you request all the data, you can try to delete the unnecessary immediately

data = yf.download("AAPL", start="2017-01-01", end="2017-04-30", auto_adjust=True)
data = data[['Close']]
print(data)

Not able to get data, downloaded in group by format using package yfinance, into pandas dataframe only for specific tickers in my sub list

As answered here: pandas dataframe select columns in multiindex

mydf.iloc[:, mydf.columns.get_level_values(1)=='Close']

@EDIT: remove useless column level

mydf.columns = mydf.columns.droplevel(1)

How to organise multiple stock data in pandas dataframe for plotting

It maybe better if you transform the data as shown in example belo.

df = pd.DataFrame({
    'stock': ['A', 'B'],
    'price': [[10,20,30,40], [1,2,3,4]],
    'time': [[1,2,3,4], [1,2,3,4]]
})

df = df.set_index(['stock']).apply(pd.Series.explode).reset_index()
df

stock	price	time
A	10	1
A	20	2
A	30	3
A	40	4
B	1	1
B	2	2
B	3	3
B	4	4

How to import csv with multi level column index

Try the following changes-

stocks_sample = stocks_sample.reset_index() #will create the multi index as separate columns
stocks_sample.to_csv('stocks_sample.csv')

#while importing 
stocks_sample_csv = pd.read_csv('stocks_sample.csv')
stocks_sample_csv.set_index(['index0', 'index1']) #change the list to the list of columns which contain the previous indexes.

downloading yahoofinance data for date ranges

You can use iterrows:

data_stocks = {}

for _, row in data_shortened.iterrows():
        placeholder = yf.download(row['symbol'], row['date_start'], row['date_end'])
        data_stocks[row['symbol']] = placeholder

stocks = pd.concat(data_stocks)

Output:

>>> stocks
                  Open   High    Low  Close  Adj Close   Volume
     Date                                                      
USAU 2020-02-27    7.1    7.5    6.5    6.9        6.9  23460.0
     2020-02-28    6.5    6.8    6.0    6.3        6.3  22010.0
     2020-03-02    6.5    6.6    6.0    6.1        6.1   7300.0
     2020-03-03    6.4    6.5    6.0    6.5        6.5   9700.0
     2020-03-04    6.5    7.0    6.1    7.0        7.0  13340.0
...                ...    ...    ...    ...        ...      ...
DTYS 2020-04-20  120.0  120.0  120.0  120.0      120.0      0.0
     2020-04-21  120.0  120.0  120.0  120.0      120.0      0.0
     2020-04-22  120.0  120.0  120.0  120.0      120.0      0.0
     2020-04-23  120.0  120.0  120.0  120.0      120.0      0.0
     2020-04-24  120.0  120.0  120.0  120.0      120.0      0.0

[358 rows x 6 columns]

Iterate though list loop error help - Yfinance

If you use for-loop then using normal assigment a = ... you can get only last value.

It is standard rule: if you use for-loop then you have to use list to get all results.

BTW: You could run Ticker(t) only once in loop - maybe it will run faster.

import pandas as pd
import yfinance as yf

tickers = ['AAPL', 'GOOGL', 'FB']

column_a = []  # <-- list for all results
column_b = []  # <-- list for all results
column_c = []  # <-- list for all results

for t in tickers:
    print('---', t, '---')
    
    all_info = yf.Ticker(t).info

    a = all_info.get('ebitda', 'NaN')
    b = all_info.get('enterpriseValue', 'NaN')
    c = all_info.get('totalAssets', 'NaN')
    
    column_a.append(a)  # <-- list for all results
    column_b.append(b)  # <-- list for all results
    column_c.append(c)  # <-- list for all results

#pd.set_option("display.max_rows", None)

df = pd.DataFrame({
    'Tickets': tickers,
    'Ebitda': column_a,           # <-- list for all results
    'EnterpriseValue' :column_b,  # <-- list for all results
    'TotalAssets': column_c,      # <-- list for all results
})
print(df)

EDIT:

Eventually you could append directly to DataFrame

First you have to create empty DataFrame but with all columns.

Next you can append dictionary with ignore_index=True. And in DataFrame you have to assign it again to DataFrame - df = df.append(...)

import pandas as pd
import yfinance as yf

tickers = ['AAPL', 'GOOGL', 'FB']

df = pd.DataFrame(columns=['Tickets', 'Ebitda', 'EnterpriseValue', 'TotalAssets'])

for t in tickers:
    print('---', t, '---')
    
    all_info = yf.Ticker(t).info
    
    a = all_info.get('ebitda', 'NaN')
    b = all_info.get('enterpriseValue', 'NaN')
    c = all_info.get('totalAssets', 'NaN')
  
    df = df.append({
        'Tickets': t,
        'Ebitda': a,
        'EnterpriseValue': b,
        'TotalAssets': c,
    }, ignore_index=True)  

#pd.set_option("display.max_rows", None)
    
print(df)

How to Deal with Multi-Level Column Names Downloaded with Yfinance