Data Scraping from Published Power Bi Visual

Data scraping from published Power BI visual

I would try the exportData method from the JavaScript library for embedding Power BI:

https://github.com/microsoft/PowerBI-JavaScript/wiki/Export-Data

Your screenshot implies that you are accessing the report through the Power BI web service app.powerbi.com. Once you have opened the report using that portal, the menu option Share / Embed report / Website or portal will give you the secure token you need to get started.

Scraping Data from a website which uses Power BI - retrieving data from Power BI on a website

Putting the scroll part and the JSON aside, I managed to read the data. The key is to read all of the elements inside the parent (which is done in the question):

parent = driver.find_element_by_xpath('//*[@id="pvExplorationHost"]/div/div/div/div[2]/div/div[2]/div[2]/visual-container[4]/div/div[3]/visual/div')
children = parent.find_elements_by_xpath('.//*')

Then sort them using their location:

x = [child.location['x'] for child in children]
y = [child.location['y'] for child in children]
index = np.lexsort((x,y))

To sort what we have read in different lines, this code may help:

rows = []
row = []
last_line = y[index[0]]
for i in index:
if last_line != y[i]:
row.append[children[i].get_attribute('title')]
else:
rows.append(row)
row = list([children[i].get_attribute('title')]
rows.append(row)

Scraping data from an online Power BI dashboard

Instead of

await page.click(".pbi-glyph-chevronrightmedium");

use

await page.$eval(".pbi-glyph-chevronrightmedium", el => el.click());

(source)

Python scraping of a site that contains PowerBI graphs

It was fun to figure out how to get info out of this site:

import re
import json
import base64
import requests
from bs4 import BeautifulSoup

url = 'https://msdh.ms.gov/msdhsite/_static/14,21995,420,873.html'

soup = BeautifulSoup(requests.get(url).content, 'html.parser')
html_data = requests.get(soup.iframe['src']).text

d = json.loads(base64.b64decode(soup.iframe['src'].split('=')[-1]).decode('utf-8'))

tenantId = d['t']
resourceKey = d['k']
resolvedClusterUri = re.search(r"var resolvedClusterUri = '(.*?)'", html_data)[1].replace('-redirect', '-api')
requestId = re.search(r"var requestId = '(.*?)'", html_data)[1]
activityId = re.search(r"var telemetrySessionId = '(.*?)'", html_data)[1]

url = resolvedClusterUri + "/public/reports/" + resourceKey + "/modelsAndExploration?preferReadOnlySession=true"
query_url = resolvedClusterUri + "/public/reports/querydata?synchronous=true"
headers={'ActivityId': activityId, 'RequestId': requestId, 'X-PowerBI-ResourceKey': resourceKey}
data = requests.get(url, headers=headers).json()

for s in data['exploration']['sections']:
if 'query' in s['visualContainers'][0]:

payload = {
"version": "1.0.0",
"queries": [
{
"Query": json.loads(s['visualContainers'][0]['query']),
"CacheKey": '',
"QueryId": "",
"ApplicationContext": {
"DatasetId": data['models'][0]['dbName'],
"Sources": [
{
"ReportId": data['exploration']['report']['objectId']
}
]
}
}
],
"cancelQueries": [],
"modelId": data['models'][0]['id']
}

section_data = requests.post(query_url, json=payload, headers=headers).json()

print(s['displayName'])
print(section_data['results'][0]['result']['data']['dsr']['DS'][0]['PH'])
print('-' * 80)

Prints:

Gender
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Female', 12029]}, {'C': ['Male', 8469]}, {'C': ['Unknown', 143]}]}]
--------------------------------------------------------------------------------
Cases and Deaths by Age
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}], 'C': ['<18', 1480, 0]}, {'C': ['18-29', 3876, 5]}, {'C': ['30-39', 3272, 16]}, {'C': ['40-49', 3387, 40]}, {'C': ['50-59', 3106, 73]}, {'C': ['60-69', 2550, 199]}, {'C': ['70-79', 1555, 259]}, {'C': ['80-89', 993, 217]}, {'C': ['90+', 420, 129]}]}]
--------------------------------------------------------------------------------
Pediatric
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['<1', 95]}, {'C': ['1-5', 299]}, {'C': ['6-10', 340]}, {'C': ['11-17', 746]}]}]
--------------------------------------------------------------------------------
Hospitalized by Age Group
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['<18', 31]}, {'C': ['18-29', 114]}, {'C': ['30-39', 176]}, {'C': ['40-49', 328]}, {'C': ['50-59', 461]}, {'C': ['60-69', 635]}, {'C': ['70-79', 553]}, {'C': ['80-89', 332]}, {'C': ['90+', 137]}]}]
--------------------------------------------------------------------------------
Hospitalized
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['No', 13849]}, {'C': ['Yes', 2767]}, {'C': ['Unknown', 1236]}]}]
--------------------------------------------------------------------------------
Deaths by Race and Ethicity
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Black (NH)', 463]}, {'C': ['White (NH)', 372]}, {'C': ['American Indian or Alaska Native (NH)', 35]}, {'C': ['Hispanic**', 15]}, {'C': ['Other (NH)', 1]}, {'C': ['Asian (NH)', 0]}]}]
--------------------------------------------------------------------------------
Underlying Condition
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}, {'N': 'M5', 'T': 4}], 'C': ['Hypertension', 311, 242, 21, 9, 1, 0]}, {'C': ['Cardiovascular Disease', 248, 197, 13, 6], 'R': 96}, {'C': ['Diabetes', 243, 129, 20, 3], 'R': 96}, {'C': ['Obesity', 168, 83, 9, 4, 0], 'R': 64}, {'C': ['Renal Disease', 129, 64, 13, 2], 'R': 96}, {'C': ['Lung Disease', 123, 108, 3, 1], 'R': 96}, {'C': ['Neurologic Conditions', 110, 143, 5, 2], 'R': 96}, {'C': ['Immunocompromised', 63, 49, 3, 1], 'R': 96}, {'C': ['Liver Disease', 14, 19, 2], 'R': 112}, {'C': ['None Noted', 2, 2, 0, 0, 1], 'R': 64}]}]
--------------------------------------------------------------------------------
Epi Curve
[{'DM0': [{'S': [{'N': 'G0', 'T': 7}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 3}], 'C': [1580515200000, 1], 'Ø': 4}, {'C': [1580688000000], 'R': 6}, {'C': [1581120000000], 'R': 6}, {'C': [1581379200000], 'R': 6}, {'C': [1581465600000], 'R': 6}, {'C': [1581638400000, 2], 'R': 4}, {'C': [1581811200000, 1], 'R': 4}, {'C': [1582156800000, '1.1428571428571428'], 'R': 2}, {'C': [1582243200000], 'R': 6}, {'C': [1582416000000], 'R': 6}, {'C': [1582675200000, 2, '1.2857142857142858']}, {'C': [1582848000000, 3, '1.5714285714285714']}, {'C': [1582934400000, '1.7142857142857142'], 'R': 2}, {'C': [1583020800000, 12, '3.2857142857142856']}, {'C': [1583107200000, 5, '3.8571428571428572']}, {'C': [1583280000000, '4.4285714285714288'], 'R': 2}, {'C': [1583366400000, 1], 'R': 4}, {'C': [1583452800000, 12, '5.8571428571428568']}, {'C': [1583539200000, 8, '6.5714285714285712']}, {'C': [1583625600000, 11, '7.7142857142857144']}, {'C': [1583712000000, 35, 11]}, {'C': [1583798400000, 25, '13.857142857142858']}, {'C': [1583884800000, 23, '16.428571428571427']}, {'C': [1583971200000, 36, '21.428571428571427']}, {'C': [1584057600000, '24.857142857142858'], 'R': 2}, {'C': [1584144000000, 40, '29.428571428571427']}, {'C': [1584230400000, 71, 38]}, {'C': [1584316800000, 94, '46.428571428571431']}, {'C': [1584403200000, 81, '54.428571428571431']}, {'C': [1584489600000, 112, '67.142857142857139']}, {'C': [1584576000000, 94, '75.428571428571431']}, {'C': [1584662400000, 123, '87.857142857142861']}, {'C': [1584748800000, 98, '96.142857142857139']}, {'C': [1584835200000, 92, '99.142857142857139']}, {'C': [1584921600000, 164, '109.14285714285714']}, {'C': [1585008000000, 137, '117.14285714285714']}, {'C': [1585094400000, 132, 120]}, {'C': [1585180800000, 119, '123.57142857142857']}, {'C': [1585267200000, 164, '129.42857142857142']}, {'C': [1585353600000, 101, '129.85714285714286']}, {'C': [1585440000000, 103, '131.42857142857142']}, {'C': [1585526400000, 162, '131.14285714285714']}, {'C': [1585612800000, 150, 133]}, {'C': [1585699200000, '135.57142857142858'], 'R': 2}, {'C': [1585785600000, 149, '139.85714285714286']}, {'C': [1585872000000, 156, '138.71428571428572']}, {'C': [1585958400000, 143, '144.71428571428572']}, {'C': [1586044800000, 127, '148.14285714285714']}, {'C': [1586131200000, 206, '154.42857142857142']}, {'C': [1586217600000, 135, '152.28571428571428']}, {'C': [1586304000000, 178, '156.28571428571428']}, {'C': [1586390400000, 183, '161.14285714285714']}, {'C': [1586476800000, 178, '164.28571428571428']}, {'C': [1586563200000, 146, '164.71428571428572']}, {'C': [1586649600000, 141, '166.71428571428572']}, {'C': [1586736000000, 218, '168.42857142857142']}, {'C': [1586822400000, 214, '179.71428571428572']}, {'C': [1586908800000, 236, 188]}, {'C': [1586995200000, 191, '189.14285714285714']}, {'C': [1587081600000, 215, '194.42857142857142']}, {'C': [1587168000000, 171, 198]}, {'C': [1587254400000, 181, '203.71428571428572']}, {'C': [1587340800000, 307, '216.42857142857142']}, {'C': [1587427200000, 259, '222.85714285714286']}, {'C': [1587513600000, 280, '229.14285714285714']}, {'C': [1587600000000, 235, '235.42857142857142']}, {'C': [1587686400000, 276, '244.14285714285714']}, {'C': [1587772800000, 193, '247.28571428571428']}, {'C': [1587859200000, 170, '245.71428571428572']}, {'C': [1587945600000, 317, '247.14285714285714']}, {'C': [1588032000000, 278, '249.85714285714286']}, {'C': [1588118400000, 298, '252.42857142857142']}, {'C': [1588204800000, 245, '253.85714285714286']}, {'C': [1588291200000, 282, '254.71428571428572']}, {'C': [1588377600000, 193], 'R': 4}, {'C': [1588464000000, 148, '251.57142857142858']}, {'C': [1588550400000, 310, '250.57142857142858']}, {'C': [1588636800000, 290, '252.28571428571428']}, {'C': [1588723200000, 365, '261.85714285714283']}, {'C': [1588809600000, 274, 266]}, {'C': [1588896000000, 273, '264.71428571428572']}, {'C': [1588982400000, 168, '261.14285714285717']}, {'C': [1589068800000, 194, '267.71428571428572']}, {'C': [1589155200000, 356, '274.28571428571428']}, {'C': [1589241600000, 308, '276.85714285714283']}, {'C': [1589328000000, 274, '263.85714285714283']}, {'C': [1589414400000, 267, '262.85714285714283']}, {'C': [1589500800000, 332, '271.28571428571428']}, {'C': [1589587200000, 224, '279.28571428571428']}, {'C': [1589673600000, 172, '276.14285714285717']}, {'C': [1589760000000, 366, '277.57142857142856']}, {'C': [1589846400000, 302, '276.71428571428572']}, {'C': [1589932800000, 333, '285.14285714285717']}, {'C': [1590019200000, 355, '297.71428571428572']}, {'C': [1590105600000, 311, '294.71428571428572']}, {'C': [1590192000000, 236, '296.42857142857144']}, {'C': [1590278400000, 250, '307.57142857142856']}, {'C': [1590364800000, 291, '296.85714285714283']}, {'C': [1590451200000, 384, '308.57142857142856']}, {'C': [1590537600000, 383, '315.71428571428572']}, {'C': [1590624000000, 404, '322.71428571428572']}, {'C': [1590710400000, 306, 322]}, {'C': [1590796800000, 201, 317]}, {'C': [1590883200000, 179, '306.85714285714283']}, {'C': [1590969600000, 290, '306.71428571428572']}, {'C': [1591056000000, 256, '288.42857142857144']}, {'C': [1591142400000, 289], 'Ø': 4}, {'C': [1591228800000, 218], 'R': 4}, {'C': [1591315200000, 260], 'R': 4}, {'C': [1591401600000, 169], 'R': 4}, {'C': [1591488000000, 120], 'R': 4}, {'C': [1591574400000, 227], 'R': 4}, {'C': [1591660800000, 222], 'R': 4}, {'C': [1591747200000, 234], 'R': 4}, {'C': [1591833600000, 263], 'R': 4}, {'C': [1591920000000, 251], 'R': 4}, {'C': [1592006400000, 204], 'R': 4}, {'C': [1592092800000, 3], 'R': 4}, {'C': [1592179200000, 1], 'R': 4}, {'C': [1592265600000, 0], 'R': 4}]}]
--------------------------------------------------------------------------------
Ethnicity/Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}], 'C': ['Black (NH)', 9865]}, {'C': ['White (NH)', 5003]}, {'C': ['Hispanic**', 1134]}, {'C': ['American Indian or Alaska Native (NH)', 298]}, {'C': ['Other (NH)', 270]}, {'C': ['Asian (NH)', 58]}]}]
--------------------------------------------------------------------------------
Deaths Gender x Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}], 'C': ['Male', 247, 182, 8, 0, 23]}, {'C': ['Female', 229, 213, 4, 29], 'R': 16}]}]
--------------------------------------------------------------------------------
Gender by Ethnicity/Race
[{'DM0': [{'S': [{'N': 'G0', 'T': 1}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 4}, {'N': 'M3', 'T': 4}, {'N': 'M4', 'T': 4}, {'N': 'M5', 'T': 4}], 'C': ['Female', 6191, 2776, 494, 177, 141, 28]}, {'C': ['Male', 3641, 2209, 637, 121, 127, 30]}]}]
--------------------------------------------------------------------------------
LTCF
[{'DM0': [{'S': [{'N': 'G0', 'T': 7}, {'N': 'M0', 'T': 4}, {'N': 'M1', 'T': 4}, {'N': 'M2', 'T': 3}], 'C': [1584316800000, 0, 1], 'Ø': 8}, {'C': [1584576000000], 'R': 14}, {'C': [1584662400000], 'R': 14}, {'C': [1584748800000, 1, 2], 'R': 8}, {'C': [1585008000000, 0, 3], 'R': 8}, {'C': [1585094400000, 1], 'R': 10}, {'C': [1585180800000, 4], 'R': 10}, {'C': [1585267200000, 3, '2.2857142857142856'], 'R': 2}, {'C': [1585353600000, 1, 5, 3]}, {'C': [1585440000000, 0, 3, '3.2857142857142856']}, {'C': [1585526400000, 7, '3.8571428571428572'], 'R': 2}, {'C': [1585612800000, 2, 4, '4.2857142857142856']}, {'C': [1585699200000, 1, 12, 6]}, {'C': [1585785600000, 4, '6.1428571428571432'], 'R': 2}, {'C': [1585872000000, 0, 7, '6.7142857142857144']}, {'C': [1585958400000, 3, 8, '7.4285714285714288']}, {'C': [1586044800000, 2, 7, '8.2857142857142865']}, {'C': [1586131200000, 1, 5, '8.1428571428571423']}, {'C': [1586217600000, 2, 8, '8.7142857142857135']}, {'C': [1586304000000, 6, 3, '8.1428571428571423']}, {'C': [1586390400000, 1, 8, '8.7142857142857135']}, {'C': [1586476800000, 3, 3, '8.5714285714285712']}, {'C': [1586563200000, 2, 6, '8.1428571428571423']}, {'C': [1586649600000, 3, 3, '7.7142857142857144']}, {'C': [1586736000000, 2, 6, 8]}, {'C': [1586822400000, 3, 9, '8.2857142857142865']}, {'C': [1586908800000, 4, 8], 'R': 2}, {'C': [1586995200000, 9, 9, '9.2857142857142865']}, {'C': [1587081600000, 5, 7, '10.142857142857142']}, {'C': [1587168000000, 4, 4], 'R': 8}, {'C': [1587254400000, 6, '10.714285714285714'], 'R': 2}, {'C': [1587340800000, 8, 4, '11.285714285714286']}, {'C': [1587427200000, 6, 8, '11.571428571428571']}, {'C': [1587513600000, 0, 3, 11]}, {'C': [1587600000000, 8, 8, '10.714285714285714']}, {'C': [1587686400000, 6, 3, '10.285714285714286']}, {'C': [1587772800000, 3, 1, '9.7142857142857135']}, {'C': [1587859200000, 4, 3, '9.2857142857142865']}, {'C': [1587945600000, 12, 5, 10]}, {'C': [1588032000000, 7, 8, '10.142857142857142']}, {'C': [1588118400000, 8, 3, '11.285714285714286']}, {'C': [1588204800000, 9, '10.714285714285714'], 'R': 4}, {'C': [1588291200000, 7, '11.714285714285714'], 'R': 2}, {'C': [1588377600000, 11, 8, '13.857142857142858']}, {'C': [1588464000000, 9, 4, '14.714285714285714']}, {'C': [1588550400000, 3, 6, '13.571428571428571']}, {'C': [1588636800000, 13, 9, '14.571428571428571']}, {'C': [1588723200000, 11, 2, '14.857142857142858']}, {'C': [1588809600000, 8, 1, '14.428571428571429']}, {'C': [1588896000000, 3, 7, '13.571428571428571']}, {'C': [1588982400000, 8, 2, '12.285714285714286']}, {'C': [1589068800000, 1, '11.714285714285714'], 'R': 2}, {'C': [1589155200000, 6, 4, '11.857142857142858']}, {'C': [1589241600000, 9, 10, '11.428571428571429']}, {'C': [1589328000000, 11, 4, '11.714285714285714']}, {'C': [1589414400000, 12, 8, '13.285714285714286']}, {'C': [1589500800000, 13, 5, '14.428571428571429']}, {'C': [1589587200000, 9, 9, '15.571428571428571']}, {'C': [1589673600000, 14, 5, 17]}, {'C': [1589760000000, 5, 8, '17.428571428571427']}, {'C': [1589846400000, 12, 5, '17.142857142857142']}, {'C': [1589932800000, 8, 4, '16.714285714285715']}, {'C': [1590019200000, 7, 7, '15.857142857142858']}, {'C': [1590105600000, 6, '15.142857142857142'], 'R': 4}, {'C': [1590192000000, 8, 5, '14.428571428571429']}, {'C': [1590278400000, 11, 4, '13.857142857142858']}, {'C': [1590364800000, 8, 12, '14.857142857142858']}, {'C': [1590451200000, 10, 5, '14.571428571428571']}, {'C': [1590537600000, 8, 8, '15.142857142857142']}, {'C': [1590624000000, 7, 7], 'R': 8}, {'C': [1590710400000, 14, 9, '16.571428571428573']}, {'C': [1590796800000, 3, 4, '15.714285714285714']}, {'C': [1590883200000, 11, 2, '15.428571428571429']}, {'C': [1590969600000, 9, 7, '14.857142857142858']}, {'C': [1591056000000, 13, 4, '15.142857142857142']}, {'C': [1591142400000, 2, '13.714285714285714'], 'R': 4}, {'C': [1591228800000, 9, 6, '13.857142857142858']}, {'C': [1591315200000, 2, '11.714285714285714'], 'R': 4}, {'C': [1591401600000, 5, 2], 'R': 8}, {'C': [1591488000000, 6, 5, '11.428571428571429']}, {'C': [1591574400000, 5, '10.571428571428571'], 'R': 4}, {'C': [1591660800000, 4, '9.4285714285714288'], 'R': 2}, {'C': [1591747200000, 2], 'R': 4, 'Ø': 8}, {'C': [1591833600000, 1, 2], 'R': 8}, {'C': [1591920000000, 4, 6], 'R': 8}, {'C': [1592006400000, 5, 3], 'R': 8}, {'C': [1592092800000, 3, 5], 'R': 8}, {'C': [1592179200000, 2, 4], 'R': 8}, {'C': [1592265600000, 1, 0], 'R': 8}]}]
--------------------------------------------------------------------------------
Recovery
[{'DM0': [{'S': [{'N': 'M0', 'T': 3}], 'M0': 15323}]}]
--------------------------------------------------------------------------------

How to export data from published Power BI report?

If you don't see the visual header (ellipsis) where you can click Export data, then no. This means that the owner of the report hide this on purpose or the admin disabled it. Also exporting data requires Pro or Premium and edit permissions on the dataset and report, which you may not have.

If you see the visual header, then you can, but keep in mind that there are some limitations:

  • The maximum number of rows that can be exported using API to .csv is 30,000.

  • Export using Underlying data will not work if the data source is an Analysis Services live connection and the version is older than 2016 and the tables in the model do not have a unique key.

  • Export using Underlying data will not work if the Show items with no data option is enabled for the visualization being exported.

  • If you have applied filters to the visualization, the exported data will export as filtered.

  • When using DirectQuery, the maximum amount of data that can be exported is 16 MB. This may result in exporting less than the maximum number of rows, especially if there are many columns, data that is difficult to compress, and other factors that increase file size and decrease number of rows exported.

  • Power BI only supports export in visuals that use basic aggregates. Export is not available for visuals using model or report measures.

  • Custom visuals, and R visuals, are not currently supported.

  • Power BI admins have the ability to disable the export of data.

  • Concurrent export data requests from the same session are not supported. Multiple requests should be run synchronously.

Power BI Queries with Data Not Used in Visual

If you are using IMPORT MODE: Yes, the data is completely pulled into the data model and RAM, no matter if it is used to calculate a visual or not.

If you are using DIRECT MODE: No, only the data needed for the visualizations (this includes slicers) is pulled into memory.

Read the docs: https://learn.microsoft.com/en-us/power-bi/connect-data/desktop-use-directquery

Scrape website's Power BI dashboard using R

The problem is that the site you want to analyze relies on JavaScript to run and fetch the content for you. In such a case, httr::GET is of no help to you.

However, since manual work is also not an option, we have Selenium.

The following does what you're looking for:

library(dplyr)
library(purrr)
library(readr)

library(wdman)
library(RSelenium)
library(xml2)
library(selectr)

# using wdman to start a selenium server
selServ <- selenium(
port = 4444L,
version = 'latest',
chromever = '84.0.4147.30', # set this to a chrome version that's available on your machine
)

# using RSelenium to start chrome on the selenium server
remDr <- remoteDriver(
remoteServerAddr = 'localhost',
port = 4444L,
browserName = 'chrome'
)

# open a new Tab on Chrome
remDr$open()

# navigate to the site you wish to analyze
report_url <- "https://app.powerbigov.us/view?r=eyJrIjoiZDFmN2ViMGEtNzQzMC00ZDU3LTkwZjUtOWU1N2RiZmJlOTYyIiwidCI6IjNiMTg1MTYzLTZjYTMtNDA2NS04NDAwLWNhNzJiM2Y3OWU2ZCJ9&pageName=ReportSectionb438b98829599a9276e2&pageName=ReportSectionb438b98829599a9276e2"
remDr$navigate(report_url)

# find and click the button leading to the Zip Code data
zipCodeBtn <- remDr$findElement('.//button[descendant::span[text()="Zip Code"]]', using="xpath")
zipCodeBtn$clickElement()

# fetch the site source in XML
zipcode_data_table <- read_html(remDr$getPageSource()[[1]]) %>%
querySelector("div.pivotTable")

Now we have the page source read into R, probably what you had in mind when you started your scraping task.

From here on it's smooth sailing and merely about converting that xml to a useable table:

col_headers <- zipcode_data_table %>%
querySelectorAll("div.columnHeaders div.pivotTableCellWrap") %>%
map_chr(xml_text)

rownames <- zipcode_data_table %>%
querySelectorAll("div.rowHeaders div.pivotTableCellWrap") %>%
map_chr(xml_text)

zipcode_data <- zipcode_data_table %>%
querySelectorAll("div.bodyCells div.pivotTableCellWrap") %>%
map(xml_parent) %>%
unique() %>%
map(~ .x %>% querySelectorAll("div.pivotTableCellWrap") %>% map_chr(xml_text)) %>%
setNames(col_headers) %>%
bind_cols()

# tadaa
df_final <- tibble(zipcode = rownames, zipcode_data) %>%
type_convert(trim_ws = T, na = c(""))

The resulting df looks like this:

> df_final
# A tibble: 15 x 5
zipcode `Confirmed Cases ` `% of Total Cases ` `Deaths ` `% of Total Deaths `
<chr> <dbl> <chr> <dbl> <chr>
1 63301 1549 17.53% 40 28.99%
2 63366 1364 15.44% 38 27.54%
3 63303 1160 13.13% 21 15.22%
4 63385 1091 12.35% 12 8.70%
5 63304 1046 11.84% 3 2.17%
6 63368 896 10.14% 12 8.70%
7 63367 882 9.98% 9 6.52%
8 534 6.04% 1 0.72%
9 63348 105 1.19% 0 0.00%
10 63341 84 0.95% 1 0.72%
11 63332 64 0.72% 0 0.00%
12 63373 25 0.28% 1 0.72%
13 63386 17 0.19% 0 0.00%
14 63357 13 0.15% 0 0.00%
15 63376 5 0.06% 0 0.00%


Related Topics



Leave a reply



Submit