Webscraping Financial Data from Morningstar

How to scrape data off morningstar

Seems like the data can be pulled form API. Only thing is the values it returns is relative to the start date entered in the payload. It'll set the out put of the start date to 0, then the numbers after are relative to that date.

import requests
import pandas as pd
from datetime import datetime
from dateutil import relativedelta

userInput = input('Choose:\n\t1. 3 Month\n\t2. 6 Month\n\t3. 1 Year\n\t4. 3 Year\n\t5. 5 Year\n\t6. 10 Year\n\n -->: ')
userDict = {'1':3,'2':6,'3':12,'4':36,'5':60,'6':120}


n = datetime.now()
n = n - relativedelta.relativedelta(days=1)
n = n - relativedelta.relativedelta(months=userDict[userInput])
dateStr = n.strftime('%Y-%m-%d')


url = 'https://tools.morningstar.co.uk/api/rest.svc/timeseries_cumulativereturn/t92wz0sj7c'

data = []
idDict = {
        'Schroder Managed Balanced Instl Acc':'F0GBR050AQ]2]0]FOGBR$$ALL',
        'GBP Moderately Adventurous Allocation':'EUCA000916]8]0]CAALL$$ALL',
        'Mixed Investment 40-85% Shares':'LC00000012]8]0]CAALL$$ALL',
        '':'F00000ZOR1]7]0]IXALL$$ALL',}


for k, v in idDict.items():
    payload = {
    'encyId': 'GBP',
    'idtype': 'Morningstar',
    'frequency': 'daily',
    'startDate':  dateStr,
    'performanceType': '',
    'outputType': 'COMPACTJSON',
    'id': v,
    'decPlaces': '8',
    'applyTrackRecordExtension': 'false'}
    
    
    temp_data = requests.get(url, params=payload).json()
    df = pd.DataFrame(temp_data)
    df['timestamp'] = pd.to_datetime(df[0], unit='ms')
    df['date'] = df['timestamp'].dt.date 
    df = df[['date',1]]  
    df.columns = ['date', k]
    data.append(df)         

final_df = pd.concat(
    (iDF.set_index('date') for iDF in data),
    axis=1, join='inner'
).reset_index()


final_df.plot(x="date", y=list(idDict.keys()), kind="line")

Output:

print (final_df.head(5).to_string())
         date  Schroder Managed Balanced Instl Acc  GBP Moderately Adventurous Allocation  Mixed Investment 40-85% Shares          
0  2019-12-22                             0.000000                               0.000000                        0.000000  0.000000
1  2019-12-23                             0.357143                               0.406784                        0.431372  0.694508
2  2019-12-24                             0.714286                               0.616217                        0.632422  0.667586
3  2019-12-25                             0.714286                               0.616217                        0.632422  0.655917
4  2019-12-26                             0.714286                               0.612474                        0.629152  0.664124
....

To get those Ids, it took a little investigating of the requests. Searching through those, I was able to find the corresponding id values and with a little bit of trial and error to work out what values meant what.

Sample Image

Those "alternate" ids used. And where those line graphs get the data from (inthose 4 request, look at the Preview pane, and you'll see the data in there.

Sample Image

Here's the final output/graph:

Sample Image

Cannot scrape dataid from Morningstar - How can I access the Network inspection tool from Python?

The data id may not be that important. I varied the code F00000412E that is associated with AADR whilst keeping the data id constant.

I got a list of all those codes from here:

https://www.firstrade.com/scripts/free_etfs/io.php

Then add the code of choice into your url e.g.

[
    "AIA",
    "iShares Asia 50 ETF",
    "FOUSA06MPQ"
  ]

Use FOUSA06MPQ

https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=FOUSA06MPQ;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30

You can verify the values by adding the other fund as a benchmark to your chart e.g. XNAS:AIA

enter image description here

28th december has value of 55.32. Compare this with JSON retrieved:

I repeated this with

[
    "ALD",
    "WisdomTree Asia Local Debt ETF",
    "F00000M8TW"
  ]

https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=F00000M8TW;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30

Webscraping with VBA morningstar financial

You can just do it with XHR and RegEx instead of cumbersome IE:

Sub Test()
    Dim sContent
    With CreateObject("MSXML2.XMLHTTP")
        .Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
        .Send
        sContent = .ResponseText
    End With
    With CreateObject("VBScript.RegExp")
        .Pattern = ",""currInsiderVal"":(.*?),"
        Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
    End With
End Sub

Here is the description how the code works:

First of all MSXML2.XMLHTTP ActiveX instance is created. GET request opened with target URL in synchronous mode (execution interrupts until response received).

Then VBScript.RegExp is created. By default .IgnoreCase, .Global and .MultiLine properties are False. The pattern is ,"currInsiderVal":(.*?),, where (.*?) is a capturing group, . means any character, .* - zero or more characters, .*? - as few as possible characters (lazy matching). Other characters in pattern to be found as is. .Execute method returns a collection of matches, there is only one match object in it since .Global is False. This match object has a collection of submatches, there is only one submatch in it since the pattern contains the only capturing group.
There are some helpful MSDN articles on regex:

Microsoft Beefs Up VBScript with Regular Expressions

Introduction to Regular Expressions

Here is the description how I created the code:

First I found an element containing the target value on the webpage DOM using browser:

target value

The corresponding node is:

<td align="right" id="currrentInsiderVal">143.51</td>

Then I made XHR and found this node in the response HTML, but it didn't contain the value (you can find response in the browser developer tools on network tab after you refresh the page):

<td align="right" id="currrentInsiderVal">
</td>

Such behavior is typical for DHTML. Dynamic HTML content is generated by scripts after the webpage loaded, either after retrieving a data from web via XHR or just processing already loaded withing webpage data. Then I just searched for the value 143.51 in the response, the snippet ,"currInsiderVal":143.51, located within JS function:

            fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};

So the regex pattern created based on that it should find the snippet ,"currInsiderVal":<some text>, where <some text> is our target value.

Scraping financial data with R and rvest

read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT®ion=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)

   Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06      TTM
1                                                           Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2                                                   Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3                                                      Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4                                                Operating expenses       NA       NA       NA       NA       NA       NA
5                                          Research and development  9043.00  9811.00 10411.00 11381.00 12046.00 11943.00
6                                 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7                             Restructuring, merger and acquisition       NA       NA       NA   127.00       NA       NA
8                                          Other operating expenses       NA  6193.00       NA       NA 10011.00  8871.00
9                                          Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10                                                 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11                                                 Interest Expense   295.00   380.00   429.00   597.00   781.00   869.00
12                                           Other income (expense)  1205.00   884.00   717.00   658.00  1127.00   883.00
13                                              Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14                                       Provision for income taxes  4921.00  5289.00  5189.00  5746.00  6314.00  5851.00
15                            Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16                                                       Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17                      Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18                                               Earnings per share       NA       NA       NA       NA       NA       NA
19                                                            Basic     2.73     2.02     2.61     2.66     1.49     1.51
20                                                          Diluted     2.69     2.00     2.58     2.63     1.48     1.50
21                              Weighted average shares outstanding       NA       NA       NA       NA       NA       NA
22                                                            Basic  8490.00  8396.00  8375.00  8299.00  8177.00  8114.00
23                                                          Diluted  8593.00  8506.00  8470.00  8399.00  8254.00  8183.00
24                                                           EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00

It's super-helpful to make browser Developer Tools "Network" tab your BFF.

(that URL came from inspecting what the "Export" button does).

Webscraping Financial Data from Morningstar

How to scrape data off morningstar

Cannot scrape dataid from Morningstar - How can I access the Network inspection tool from Python?

Webscraping with VBA morningstar financial

Scraping financial data with R and rvest

Related Topics

Leave a reply