Webscraping Financial Data from Morningstar

How to scrape data off morningstar

Seems like the data can be pulled form API. Only thing is the values it returns is relative to the start date entered in the payload. It'll set the out put of the start date to 0, then the numbers after are relative to that date.

import requests
import pandas as pd
from datetime import datetime
from dateutil import relativedelta

userInput = input('Choose:\n\t1. 3 Month\n\t2. 6 Month\n\t3. 1 Year\n\t4. 3 Year\n\t5. 5 Year\n\t6. 10 Year\n\n -->: ')
userDict = {'1':3,'2':6,'3':12,'4':36,'5':60,'6':120}


n = datetime.now()
n = n - relativedelta.relativedelta(days=1)
n = n - relativedelta.relativedelta(months=userDict[userInput])
dateStr = n.strftime('%Y-%m-%d')


url = 'https://tools.morningstar.co.uk/api/rest.svc/timeseries_cumulativereturn/t92wz0sj7c'

data = []
idDict = {
'Schroder Managed Balanced Instl Acc':'F0GBR050AQ]2]0]FOGBR$$ALL',
'GBP Moderately Adventurous Allocation':'EUCA000916]8]0]CAALL$$ALL',
'Mixed Investment 40-85% Shares':'LC00000012]8]0]CAALL$$ALL',
'':'F00000ZOR1]7]0]IXALL$$ALL',}


for k, v in idDict.items():
payload = {
'encyId': 'GBP',
'idtype': 'Morningstar',
'frequency': 'daily',
'startDate': dateStr,
'performanceType': '',
'outputType': 'COMPACTJSON',
'id': v,
'decPlaces': '8',
'applyTrackRecordExtension': 'false'}


temp_data = requests.get(url, params=payload).json()
df = pd.DataFrame(temp_data)
df['timestamp'] = pd.to_datetime(df[0], unit='ms')
df['date'] = df['timestamp'].dt.date
df = df[['date',1]]
df.columns = ['date', k]
data.append(df)

final_df = pd.concat(
(iDF.set_index('date') for iDF in data),
axis=1, join='inner'
).reset_index()


final_df.plot(x="date", y=list(idDict.keys()), kind="line")

Output:

print (final_df.head(5).to_string())
date Schroder Managed Balanced Instl Acc GBP Moderately Adventurous Allocation Mixed Investment 40-85% Shares
0 2019-12-22 0.000000 0.000000 0.000000 0.000000
1 2019-12-23 0.357143 0.406784 0.431372 0.694508
2 2019-12-24 0.714286 0.616217 0.632422 0.667586
3 2019-12-25 0.714286 0.616217 0.632422 0.655917
4 2019-12-26 0.714286 0.612474 0.629152 0.664124
....

To get those Ids, it took a little investigating of the requests. Searching through those, I was able to find the corresponding id values and with a little bit of trial and error to work out what values meant what.

Sample Image

Those "alternate" ids used. And where those line graphs get the data from (inthose 4 request, look at the Preview pane, and you'll see the data in there.

Sample Image

Here's the final output/graph:

Sample Image

Cannot scrape dataid from Morningstar - How can I access the Network inspection tool from Python?

The data id may not be that important. I varied the code F00000412E that is associated with AADR whilst keeping the data id constant.

I got a list of all those codes from here:

https://www.firstrade.com/scripts/free_etfs/io.php

Then add the code of choice into your url e.g.

[
"AIA",
"iShares Asia 50 ETF",
"FOUSA06MPQ"
]

Use FOUSA06MPQ

https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=FOUSA06MPQ;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30

You can verify the values by adding the other fund as a benchmark to your chart e.g. XNAS:AIA

enter image description here

28th december has value of 55.32. Compare this with JSON retrieved:

I repeated this with

[
"ALD",
"WisdomTree Asia Local Debt ETF",
"F00000M8TW"
]

https://mschart.morningstar.com/chartweb/defaultChart?type=getcc&secids=F00000M8TW;FE&dataid=8225&startdate=2017-01-01&enddate=2018-12-30

Webscraping with VBA morningstar financial

You can just do it with XHR and RegEx instead of cumbersome IE:

Sub Test()
Dim sContent
With CreateObject("MSXML2.XMLHTTP")
.Open "GET", "http://investors.morningstar.com/ownership/shareholders-overview.html?t=TWTR®ion=usa&culture=en-US", False
.Send
sContent = .ResponseText
End With
With CreateObject("VBScript.RegExp")
.Pattern = ",""currInsiderVal"":(.*?),"
Range("A30").Value = .Execute(sContent).Item(0).SubMatches(0)
End With
End Sub

Here is the description how the code works:

First of all MSXML2.XMLHTTP ActiveX instance is created. GET request opened with target URL in synchronous mode (execution interrupts until response received).

Then VBScript.RegExp is created. By default .IgnoreCase, .Global and .MultiLine properties are False. The pattern is ,"currInsiderVal":(.*?),, where (.*?) is a capturing group, . means any character, .* - zero or more characters, .*? - as few as possible characters (lazy matching). Other characters in pattern to be found as is. .Execute method returns a collection of matches, there is only one match object in it since .Global is False. This match object has a collection of submatches, there is only one submatch in it since the pattern contains the only capturing group.
There are some helpful MSDN articles on regex:

Microsoft Beefs Up VBScript with Regular Expressions

Introduction to Regular Expressions

Here is the description how I created the code:

First I found an element containing the target value on the webpage DOM using browser:

target value

The corresponding node is:

<td align="right" id="currrentInsiderVal">143.51</td>

Then I made XHR and found this node in the response HTML, but it didn't contain the value (you can find response in the browser developer tools on network tab after you refresh the page):

<td align="right" id="currrentInsiderVal">
</td>

Such behavior is typical for DHTML. Dynamic HTML content is generated by scripts after the webpage loaded, either after retrieving a data from web via XHR or just processing already loaded withing webpage data. Then I just searched for the value 143.51 in the response, the snippet ,"currInsiderVal":143.51, located within JS function:

            fundsArr = {"fundTotalHistVal":132.61,"mutualFunds":[[1,89,"#a71620"],[2,145,"#a71620"],[3,152,"#a71620"],[4,198,"#a71620"],[5,155,"#a71620"],[6,146,"#a71620"],[7,146,"#a71620"],[8,132,"#a71620"]],"insiderHisMaxVal":3.535,"institutions":[[1,273,"#283862"],[2,318,"#283862"],[3,351,"#283862"],[4,369,"#283862"],[5,311,"#283862"],[6,298,"#283862"],[7,274,"#283862"],[8,263,"#283862"]],"currFundData":[2,2202,"#a6001d"],"currInstData":[1,4370,"#283864"],"instHistMaxVal":369,"insiders":[[5,0.042,"#ff6c21"],[6,0.057,"#ff6c21"],[7,0.057,"#ff6c21"],[8,3.535,"#ff6c21"],[5,0],[6,0],[7,0],[8,0]],"currMax":4370,"histLineQuars":[[1,"Q2"],[2,"Q3"],[3,"Q4"],[4,"Q1<br>2015"],[5,"Q2"],[6,"Q3"],[7,"Q4"],[8,"Q1<br>2016"]],"fundHisMaxVal":198,"currInsiderData":[3,143,"#ff6900"],"currFundVal":2202.85,"quarters":[[1,"Q2"],[2,""],[3,""],[4,"Q1<br>2015"],[5,""],[6,""],[7,""],[8,"Q1<br>2016"]],"insiderTotalHistVal":3.54,"currInstVal":4370.46,"currInsiderVal":143.51,"use10YearData":"false","instTotalHistVal":263.74,"maxValue":369};

So the regex pattern created based on that it should find the snippet ,"currInsiderVal":<some text>, where <some text> is our target value.

Scraping financial data with R and rvest

read.csv("http://financials.morningstar.com/ajax/ReportProcess4CSV.html?&t=XNAS:MSFT®ion=usa&culture=en-US&cur=&reportType=is&period=12&dataType=A&order=asc&columnYear=5&curYearPart=1st5year&rounding=3&view=raw&r=865827&denominatorView=raw&number=3", skip=1)

Fiscal.year.ends.in.June..USD.in.millions.except.per.share.data. X2011.06 X2012.06 X2013.06 X2014.06 X2015.06 TTM
1 Revenue 69943.00 73723.00 77849.00 86833.00 93580.00 90758.00
2 Cost of revenue 15577.00 17530.00 20249.00 26934.00 33038.00 31972.00
3 Gross profit 54366.00 56193.00 57600.00 59899.00 60542.00 58786.00
4 Operating expenses NA NA NA NA NA NA
5 Research and development 9043.00 9811.00 10411.00 11381.00 12046.00 11943.00
6 Sales, General and administrative 18162.00 18426.00 20425.00 20632.00 20324.00 19862.00
7 Restructuring, merger and acquisition NA NA NA 127.00 NA NA
8 Other operating expenses NA 6193.00 NA NA 10011.00 8871.00
9 Total operating expenses 27205.00 34430.00 30836.00 32140.00 42381.00 40676.00
10 Operating income 27161.00 21763.00 26764.00 27759.00 18161.00 18110.00
11 Interest Expense 295.00 380.00 429.00 597.00 781.00 869.00
12 Other income (expense) 1205.00 884.00 717.00 658.00 1127.00 883.00
13 Income before taxes 28071.00 22267.00 27052.00 27820.00 18507.00 18124.00
14 Provision for income taxes 4921.00 5289.00 5189.00 5746.00 6314.00 5851.00
15 Net income from continuing operations 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
16 Net income 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
17 Net income available to common shareholders 23150.00 16978.00 21863.00 22074.00 12193.00 12273.00
18 Earnings per share NA NA NA NA NA NA
19 Basic 2.73 2.02 2.61 2.66 1.49 1.51
20 Diluted 2.69 2.00 2.58 2.63 1.48 1.50
21 Weighted average shares outstanding NA NA NA NA NA NA
22 Basic 8490.00 8396.00 8375.00 8299.00 8177.00 8114.00
23 Diluted 8593.00 8506.00 8470.00 8399.00 8254.00 8183.00
24 EBITDA 31132.00 25614.00 31236.00 33629.00 25245.00 24983.00

It's super-helpful to make browser Developer Tools "Network" tab your BFF.

(that URL came from inspecting what the "Export" button does).



Related Topics



Leave a reply



Submit