Excel crashes when attempting to inspect DispStaticNodeList
If the name is DispStaticNodeList, we can be pretty sure it's an array..(or at least has array semantics).
Arrays can normally be iterated with a For Each
loop, however it's more efficient to iterate them using a For
loop. Looks like what you're getting isn't exactly an array, and while it appears to support indexing, it apparently doesn't support enumeration, which would explain the blowing up when you attempt to enumerate it with a For Each
loop.
Looks like the locals toolwindow might be using For Each
semantics to list the items in the collection.
I'm not familiar with that particular library so this is a bit of (educated) guesswork, but it's pretty easy to make a custom COM collection type that can't be iterated with a For Each
loop in VBA - normally the error is caught on the VBA side though... Seems there might be a bug in the library's enumerator implementation (assuming there's an enumerator for it) causing it to throw an exception that ends up unhandled and somehow takes everything down with it... thing is, you can't fix & recompile that library... so the only thing you can do is to avoid iterating that type with a For Each
loop, and avoid expanding it in the locals toolwindow (and so, ...save your work often!).
This article gives a good idea from a C#/.NET perspective, of how COM enumeration works. Of course that library isn't managed code (.NET), but the COM concepts at play are the same.
TL;DR: it's not because you can For...Next
that you can For Each
; the COM type involved must explicitly support enumeration. If the VBA code compiles with a For Each
loop, then it does, so it must be a bug in the type's enumerator.
Excel Crashes When Opening Trusted Document With Macros Or Enabling Macros
This is caused by a macro with a compile error. To fix it, use the following steps (all credit to alulla at Ozgrid, whose directions I've lightly adapted for clarity):
- Open a new Excel workbook, not your crashing workbook
- Go to the Developer Tab and click “Macro Security” in the Code section of the Ribbon
- Click the bullet option entitled “Disable all macros with notification”
- Go to the Trusted Locations tab on the left and check the “Disable all Trusted Locations” box at the bottom
- Go to Trusted Documents and check the “Disable all Trusted Documents” box
- Click OK and close the new workbook
- Open your Excel file that was crashing
- Do not click "Enable Macros" - instead go to the Developer Tab and
open the Visual Basic Editor- In the VB Editor, save the project and then click Debug > Compile VBAProject
- Correct any compile errors that occur
- Save the VB project again and then save the Excel workbook
- Revert any changes you made to Macro Security in steps 3-5
- Close Excel, re-open the previously crashing workbook, and Enable Macros
I experienced today that this can also resolve "Can't find project or library" crashes where no reference is actually missing and Excel crashes before you can even get to the reference dialog in the first place.
Get data from a web table with table tag
There is not enough HTML to determine if the TOlinha2
is a consistent class name for all the tds within the table of interest; and is limited only to this table. If it is then you can indeed use .querySelectorAll
You could use the CSS selector:
ie.document.querySelectorAll(".TOlinha2")
Where "."
stands for className.
You cannot iterate over the returned NodeList with a For Each Loop
. See my question Excel crashes when attempting to inspect DispStaticNodeList. Excel will crash and you will lose any unsaved data.
You have to loop the length of the nodeList e.g.
Dim i As Long
For i = 0 To Len(nodeList) -1
Debug.Print nodeList(i).innerText
Next i
Sometimes you need different syntax which is:
Debug.Print nodeList.Item(i).innerText
You can seek to further narrow this CSS selector down with more qualifying elements such as, the element must be within tbody i.e. a table
, and preceeded by a tr
(table row) and have classname .TOLinha2
ie.document.querySelectorAll("tbody tr .TOlinha2")
How to hold a reference to the items matched by `querySelectorAll`, in a variable, that allows you to access its methods?
Do not despair VBA web-scrapers (I know there are a few!) We can still have the luxury of css selectors and the benefits, though admittedly somewhat limited in VBA, that they bring.
To the rescue:
MSHTML
, gratias IE, offers a number of scripting object interfaces . One of which is the IHTMLDOMChildrenCollection
interface, which inherits from IDispatch
, and which:
provides methods to access items in the collection.
This includes the .Length
property and access to items via .item(index)
.
Dim nodeList2 As MSHTML.IHTMLDOMChildrenCollection
Set nodeList2 = html.querySelectorAll("a")
Debug.Print nodeList2.Length ' => n
Debug.Print nodeList2.Item(0).innerText
This is supported on clients Windows XP +, and servers from Windows 2000 Server onwards.
VBA:
Public Sub ReviewingNodeListMethods()
'' References (VBE > Tools > References):
''Microsoft HTML object Library
''Microsoft XML library (v.6 for me)
Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument 'XMLHTTP60 is for Excel 2016. Change according to your version e.g. XMLHTTP for 2013)
Set http = New MSXML2.XMLHTTP60: Set html = New MSHTML.HTMLDocument
With http
.Open "GET", "http://books.toscrape.com/", False
.send
html.body.innerHTML = .responseText
End With
Dim nodeList1 As Object, nodeList2 As MSHTML.IHTMLDOMChildrenCollection
Set nodeList1 = html.querySelectorAll("a")
Set nodeList2 = html.querySelectorAll("a")
Debug.Print nodeList1.Length ' => Null
Debug.Print nodeList2.Length ' => 94
Debug.Print nodeList2.Item(0).innerText
' Dim i As Long
'
' With html.querySelectorAll("a")
' For i = 0 To .Length - 1
' Debug.Print .Item(i).innerText
' Next
' End With
'' ================Warning: This will crash Excel -============================
' Dim node As MSHTML.IHTMLDOMNode
'
' For Each node In nodeList2
' Debug.Print node.innerText
' Next
'' ================Warning: This will crash Excel -============================
End Sub
N.B. There is still the underlying problem of the collection enumeration method; it causes Excel to crash if you attempt a For Each
e.g.
Dim node As MSHTML.IHTMLDOMNode
For Each node In nodeList2
Debug.Print node.innerText
Next
Updating your old Questions/Answers:
- You can use this SEDE query to identify potential candidates for revision. Enter your userid and the search term "querySelectorAll"
- Or simply use the following in the search bar:
querySelectorAll user:<userid> is:answer
;querySelectorAll user:<userid> is:question
Extract innerText from HTML div tags un a local file
Don't use late-bound HTMLFile
. It's like going for a run in clogs. It uses IE version below 9, which doesn't support .getElementsByClassName
method or querySelector/querySelectorAll
.
The early bound MSHTML.HTMLDocument
is accessible via the Microsoft HTML Object Library
; a standard shipped library that Microsoft paid a lot of money in an anti-trust settlement to ensure is present in all installs. The only scenario I can think of where this would be a problem is with a "cut-off their nose to spite their face" IT dept refusing to install for a rights-restricted user on a work machine. You aren't going to have version issues either.
Then you can use the class
attribute to target the nodes of interest with either of the methods I mentioned above. A likely bug in the provision of an enumeration method means that you can't For Each
over the DispStaticNodeList
returned by querySelectorAll
(unlike with the collection returned by getElementsByClassName
); you can, however, use a For Loop
to run the .Length
of the nodeList.
Alter the below to write out where you want in sheet:
Dim nodeList As Object, i As Long
Set nodeList = HTML_Content.querySelectorAll(".report-row-label") 'or use HTML_Content.getElementsByClassName("report-row-label") method and value
For i = 0 To nodeList.Length - 1
Worksheets(1).Cells(i + 1, 1) = nodeList.Item(i).innertext
Next
Of course, the above is with adding reference to Microsoft HTML Object Library
via VBE>Tools>References and then declaring and instantiating a MSHTML.HTMLDocument
instance.
Dim HTML_Content As MSHTML.HTMLDocument
Set HTML_Content = New MSHTML.HTMLDocument
Then populating the variable's .body.innerHTML
with your file content as before.
With getElementsByClassName
you should be able to do something like as follows:
Dim nodeList As Object, node As Object, i As Long 'change these to use explicit types here for the element and element collection types
Set nodeList = HTML_Content.getElementsByClassName("report-row-label")
For Each node In nodeList
Worksheets(1).Cells(i + 1, 1) = node.innerText
i = i + 1
Next
Extract list of all input boxes on webpage vba
- For learning purposes maybe choose a website that has more obvious inputboxes, rather than dropdowns.
- Many inputboxes won't be pre-populated so maybe consider reading other properties of the retrieved elements. Or even writing to them and then retrieving those values.
- Selecting by tag name can bring back a host of items that you might not have expected.
Bearing all of the above in mind. Try running the following, which generates a collection of <input>
tag elements.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As MSHTML.IHTMLElementCollection, iBox As MSHTML.IHTMLElement, i As Long
Set inputBoxes = html.getElementsByTagName("input") '<== the collection of input tags on the page
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For Each iBox In inputBoxes
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & iBox.ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & iBox.className,
Debug.Print "Title: " & iBox.Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
i = i + 1
Next iBox
End Sub
Sample output:
From the above, it looks like class name might be in some ways more informative if you are looking to target boxes to input information into.
An initial inspection of the page source, selecting an inputbox and right-click > inspect... will help you refine your choices.
I noticed that a lot of the boxes of interest had the Input
tag and then type = "text"
This means you can target elements matching this pattern using CSS selectors. In this case using the selector input[type=""text""]
.
Adjusting the former code to factor this in gives a smaller set of more targeted results. Note, using .querySelectorAll
, to apply the CSS selector, returns a NodeList
object which requires a different method of iterating over. A For Each
Loop will cause Excel to crash as described here.
Code:
Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument
With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With
Dim inputBoxes As Object, i As Long
Set inputBoxes = html.querySelectorAll("input[type=""text""]") '<== the collection of text input boxes on page. Returned as a NodeList
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For i = 0 To inputBoxes.Length - 1
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & inputBoxes.Item(i).ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & inputBoxes.Item(i).className,
Debug.Print "Title: " & inputBoxes.Item(i).Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
Next i
End Sub
Sample results:
Note: I have edited the spacing to fit more into the image.
References added via VBE > Tools > References
Last two are those of interest. The bottom one will be version specific and you will need to re-write XMLHTTP60
which is for XML 6.0
to target your version of Excel if not using Excel 2016.
Loop through elements in list LI tag
querySelectorAll returns a nodeList which due to a likely bug cannot be For Each'd over. Instead, you need to traverse using the .Length property
Dim i As Long
For i = 0 To post.Length -1
Debug.Print post.item(i).innerText
Next
You need to SET the object reference and test the nodeType of the NextSiblings if you wish to retrieve the list of headers and associated values:
Option Explicit
Sub GetInfo()
Dim http As msxml2.XMLHTTP60, html As MSHTML.HTMLDocument, post As Object
Set http = New msxml2.XMLHTTP60
Set html = New MSHTML.HTMLDocument
With http
.Open "GET", "https://www.amazon.de/sp?_encoding=UTF8&asin=&isAmazonFulfilled=1&isCBA=&marketplaceID=A1PA6795UKMFR9&orderID=&seller=A2PGPJL0BBLHLX&tab=&vasStoreID=", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With
Set post = html.querySelectorAll(".a-list-item .a-text-bold")
Dim i As Long, sibling As Object, val As Variant
For i = 0 To post.Length - 1
Set sibling = post.item(i).NextSibling
Debug.Print post.item(i).innerText, " = "
Select Case sibling.NodeType
Case 3
val = sibling.NodeValue
Case 1
val = sibling.innerText
End Select
Debug.Print val
Next
Stop
End Sub
Can MSXML2.XMLHTTP retrieve ALL of the HTML data for a given webpage?
Why Json? Because the page is loaded using json data
To View: Use Google Chrome --> Press F12 --> Load URL --> Goto Network tab
Code:
Sub getHTTP()
Dim Url As String, data As String
Dim xml As Object, JSON As Object, colObj, item
Url = "https://www.tiff.net/data/films-events-2018.json?q=1513263947586"
Set xml = CreateObject("MSXML2.ServerXMLHTTP")
With xml
.Open "GET", Url, False
.send
data = .responseText
End With
Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")
For Each item In colObj
Debug.Print item("title")
Debug.Print item("description")
For Each c1 In item("cast")
Debug.Print c1
Next
For Each c2 In item("countries")
Debug.Print c2
Next
Next
End Sub
Output
Installation of JsonConverter
- Download the latest release
- Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
Add Dictionary reference/class - For Windows-only, include a reference to "Microsoft Scripting Runtime"
- For Windows and Mac, include VBA-Dictionary
Tree View of Data
Related Topics
CSS Circular Cropping of Rectangle Image
Adjusting and Image Size to Fit a Div with Bootstrap
Extract Content of Div from Google Translate with Vba
CSS Animation from Left to Right
Textarea to Fill a Parent Container Exactly, with Padding
Parallax Scrolling with CSS Only
Context.Getimagedata() on Localhost
Fill a Div with an Image Respecting Aspect Ratio
How to Serve HTML File from Another Directory as Actionresult
Content of Div Is Longer Then Div Itself When Width Is Set to 100%
Open Local Files(File://) Using Chrome
How to Make a Div Take The Full Width of The Page When It Is Inside Another Div That Have 90% Width
100% Width Background Image with an 'Auto' Height
Set Div to Remaining Height Using CSS with Unknown Height Divs Above and Below
How to Create an H264 Video with an Alpha Channel for Use with HTML5 Canvas
Flexbox Width Variation with Content, Should Be Fixed Width
Force Ie Contenteditable Element to Create Line Breaks on Enter Key, Without Breaking Undo