Excel Crashes When Attempting to Inspect Dispstaticnodelist

Excel crashes when attempting to inspect DispStaticNodeList

If the name is DispStaticNodeList, we can be pretty sure it's an array..(or at least has array semantics).

Arrays can normally be iterated with a For Each loop, however it's more efficient to iterate them using a For loop. Looks like what you're getting isn't exactly an array, and while it appears to support indexing, it apparently doesn't support enumeration, which would explain the blowing up when you attempt to enumerate it with a For Each loop.

Looks like the locals toolwindow might be using For Each semantics to list the items in the collection.

I'm not familiar with that particular library so this is a bit of (educated) guesswork, but it's pretty easy to make a custom COM collection type that can't be iterated with a For Each loop in VBA - normally the error is caught on the VBA side though... Seems there might be a bug in the library's enumerator implementation (assuming there's an enumerator for it) causing it to throw an exception that ends up unhandled and somehow takes everything down with it... thing is, you can't fix & recompile that library... so the only thing you can do is to avoid iterating that type with a For Each loop, and avoid expanding it in the locals toolwindow (and so, ...save your work often!).

This article gives a good idea from a C#/.NET perspective, of how COM enumeration works. Of course that library isn't managed code (.NET), but the COM concepts at play are the same.

TL;DR: it's not because you can For...Next that you can For Each; the COM type involved must explicitly support enumeration. If the VBA code compiles with a For Each loop, then it does, so it must be a bug in the type's enumerator.

Excel Crashes When Opening Trusted Document With Macros Or Enabling Macros

This is caused by a macro with a compile error. To fix it, use the following steps (all credit to alulla at Ozgrid, whose directions I've lightly adapted for clarity):

  1. Open a new Excel workbook, not your crashing workbook
  2. Go to the Developer Tab and click “Macro Security” in the Code section of the Ribbon
  3. Click the bullet option entitled “Disable all macros with notification”
  4. Go to the Trusted Locations tab on the left and check the “Disable all Trusted Locations” box at the bottom
  5. Go to Trusted Documents and check the “Disable all Trusted Documents” box
  6. Click OK and close the new workbook
  7. Open your Excel file that was crashing
  8. Do not click "Enable Macros" - instead go to the Developer Tab and
    open the Visual Basic Editor
  9. In the VB Editor, save the project and then click Debug > Compile VBAProject
  10. Correct any compile errors that occur
  11. Save the VB project again and then save the Excel workbook
  12. Revert any changes you made to Macro Security in steps 3-5
  13. Close Excel, re-open the previously crashing workbook, and Enable Macros

I experienced today that this can also resolve "Can't find project or library" crashes where no reference is actually missing and Excel crashes before you can even get to the reference dialog in the first place.

Get data from a web table with table tag

There is not enough HTML to determine if the TOlinha2 is a consistent class name for all the tds within the table of interest; and is limited only to this table. If it is then you can indeed use .querySelectorAll

You could use the CSS selector:

ie.document.querySelectorAll(".TOlinha2")

Where "." stands for className.

You cannot iterate over the returned NodeList with a For Each Loop. See my question Excel crashes when attempting to inspect DispStaticNodeList. Excel will crash and you will lose any unsaved data.

You have to loop the length of the nodeList e.g.

Dim i As Long
For i = 0 To Len(nodeList) -1
Debug.Print nodeList(i).innerText
Next i

Sometimes you need different syntax which is:

Debug.Print nodeList.Item(i).innerText 

You can seek to further narrow this CSS selector down with more qualifying elements such as, the element must be within tbody i.e. a table, and preceeded by a tr (table row) and have classname .TOLinha2

ie.document.querySelectorAll("tbody tr .TOlinha2")

How to hold a reference to the items matched by `querySelectorAll`, in a variable, that allows you to access its methods?

Do not despair VBA web-scrapers (I know there are a few!) We can still have the luxury of css selectors and the benefits, though admittedly somewhat limited in VBA, that they bring.

To the rescue:

MSHTML, gratias IE, offers a number of scripting object interfaces . One of which is the IHTMLDOMChildrenCollection interface, which inherits from IDispatch, and which:

provides methods to access items in the collection.

This includes the .Length property and access to items via .item(index).

Dim nodeList2 As MSHTML.IHTMLDOMChildrenCollection

Set nodeList2 = html.querySelectorAll("a")
Debug.Print nodeList2.Length ' => n
Debug.Print nodeList2.Item(0).innerText

This is supported on clients Windows XP +, and servers from Windows 2000 Server onwards.


VBA:

Public Sub ReviewingNodeListMethods()
'' References (VBE > Tools > References):
''Microsoft HTML object Library
''Microsoft XML library (v.6 for me)

Dim http As MSXML2.XMLHTTP60, html As MSHTML.HTMLDocument 'XMLHTTP60 is for Excel 2016. Change according to your version e.g. XMLHTTP for 2013)

Set http = New MSXML2.XMLHTTP60: Set html = New MSHTML.HTMLDocument

With http
.Open "GET", "http://books.toscrape.com/", False
.send
html.body.innerHTML = .responseText
End With

Dim nodeList1 As Object, nodeList2 As MSHTML.IHTMLDOMChildrenCollection

Set nodeList1 = html.querySelectorAll("a")
Set nodeList2 = html.querySelectorAll("a")

Debug.Print nodeList1.Length ' => Null
Debug.Print nodeList2.Length ' => 94

Debug.Print nodeList2.Item(0).innerText

' Dim i As Long
'
' With html.querySelectorAll("a")
' For i = 0 To .Length - 1
' Debug.Print .Item(i).innerText
' Next
' End With

'' ================Warning: This will crash Excel -============================

' Dim node As MSHTML.IHTMLDOMNode
'
' For Each node In nodeList2
' Debug.Print node.innerText
' Next
'' ================Warning: This will crash Excel -============================

End Sub

N.B. There is still the underlying problem of the collection enumeration method; it causes Excel to crash if you attempt a For Each e.g.

Dim node As MSHTML.IHTMLDOMNode

For Each node In nodeList2
Debug.Print node.innerText
Next

Updating your old Questions/Answers:

  1. You can use this SEDE query to identify potential candidates for revision. Enter your userid and the search term "querySelectorAll"
  2. Or simply use the following in the search bar:
    querySelectorAll user:<userid> is:answer ; querySelectorAll user:<userid> is:question

Extract innerText from HTML div tags un a local file

Don't use late-bound HTMLFile. It's like going for a run in clogs. It uses IE version below 9, which doesn't support .getElementsByClassName method or querySelector/querySelectorAll.

The early bound MSHTML.HTMLDocument is accessible via the Microsoft HTML Object Library; a standard shipped library that Microsoft paid a lot of money in an anti-trust settlement to ensure is present in all installs. The only scenario I can think of where this would be a problem is with a "cut-off their nose to spite their face" IT dept refusing to install for a rights-restricted user on a work machine. You aren't going to have version issues either.

Then you can use the classattribute to target the nodes of interest with either of the methods I mentioned above. A likely bug in the provision of an enumeration method means that you can't For Each over the DispStaticNodeList returned by querySelectorAll (unlike with the collection returned by getElementsByClassName); you can, however, use a For Loop to run the .Length of the nodeList.

Alter the below to write out where you want in sheet:

Dim nodeList As Object, i As Long

Set nodeList = HTML_Content.querySelectorAll(".report-row-label") 'or use HTML_Content.getElementsByClassName("report-row-label") method and value

For i = 0 To nodeList.Length - 1
Worksheets(1).Cells(i + 1, 1) = nodeList.Item(i).innertext
Next

Of course, the above is with adding reference to Microsoft HTML Object Library via VBE>Tools>References and then declaring and instantiating a MSHTML.HTMLDocument instance.

Dim HTML_Content As MSHTML.HTMLDocument

Set HTML_Content = New MSHTML.HTMLDocument

Then populating the variable's .body.innerHTML with your file content as before.

With getElementsByClassName you should be able to do something like as follows:

Dim nodeList As Object, node As Object, i As Long 'change these to use explicit types here for the element and element collection types

Set nodeList = HTML_Content.getElementsByClassName("report-row-label")

For Each node In nodeList
Worksheets(1).Cells(i + 1, 1) = node.innerText
i = i + 1
Next

Extract list of all input boxes on webpage vba

  1. For learning purposes maybe choose a website that has more obvious inputboxes, rather than dropdowns.
  2. Many inputboxes won't be pre-populated so maybe consider reading other properties of the retrieved elements. Or even writing to them and then retrieving those values.
  3. Selecting by tag name can bring back a host of items that you might not have expected.

Bearing all of the above in mind. Try running the following, which generates a collection of <input> tag elements.

Code:

Option Explicit

Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument

With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With

Dim inputBoxes As MSHTML.IHTMLElementCollection, iBox As MSHTML.IHTMLElement, i As Long

Set inputBoxes = html.getElementsByTagName("input") '<== the collection of input tags on the page
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For Each iBox In inputBoxes
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & iBox.ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & iBox.className,
Debug.Print "Title: " & iBox.Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
i = i + 1
Next iBox
End Sub

Sample output:

Sample results

From the above, it looks like class name might be in some ways more informative if you are looking to target boxes to input information into.

An initial inspection of the page source, selecting an inputbox and right-click > inspect... will help you refine your choices.

I noticed that a lot of the boxes of interest had the Input tag and then type = "text"

Username box

This means you can target elements matching this pattern using CSS selectors. In this case using the selector input[type=""text""].

Adjusting the former code to factor this in gives a smaller set of more targeted results. Note, using .querySelectorAll, to apply the CSS selector, returns a NodeList object which requires a different method of iterating over. A For Each Loop will cause Excel to crash as described here.

Code:

Option Explicit
Public Sub PrintTagInfo()
'Tools > references > Microsoft XML and HTML Object library
Dim http As New XMLHTTP60 '<== this will be specific to your excel version
Dim html As New HTMLDocument

With http
.Open "GET", "https://www.mrexcel.com/forum/register.php", False
.send
html.body.innerHTML = .responseText
End With

Dim inputBoxes As Object, i As Long

Set inputBoxes = html.querySelectorAll("input[type=""text""]") '<== the collection of text input boxes on page. Returned as a NodeList
'<== These are input boxes i.e. you are putting info into them so perhaps populate and then try to read what is in the entry box?
For i = 0 To inputBoxes.Length - 1
Debug.Print "Result #" & i + 1
Debug.Print vbNewLine
Debug.Print "ID: " & inputBoxes.Item(i).ID '<== select a sample of properties to print out as some maybe empty
Debug.Print "ClassName: " & inputBoxes.Item(i).className,
Debug.Print "Title: " & inputBoxes.Item(i).Title
Debug.Print String$(20, Chr$(61))
Debug.Print vbNewLine
Next i
End Sub

Sample results:

Note: I have edited the spacing to fit more into the image.

Refined results

References added via VBE > Tools > References

References

Last two are those of interest. The bottom one will be version specific and you will need to re-write XMLHTTP60 which is for XML 6.0 to target your version of Excel if not using Excel 2016.

Loop through elements in list LI tag

querySelectorAll returns a nodeList which due to a likely bug cannot be For Each'd over. Instead, you need to traverse using the .Length property

Dim i As Long

For i = 0 To post.Length -1
Debug.Print post.item(i).innerText
Next

You need to SET the object reference and test the nodeType of the NextSiblings if you wish to retrieve the list of headers and associated values:

Option Explicit

Sub GetInfo()
Dim http As msxml2.XMLHTTP60, html As MSHTML.HTMLDocument, post As Object

Set http = New msxml2.XMLHTTP60
Set html = New MSHTML.HTMLDocument

With http
.Open "GET", "https://www.amazon.de/sp?_encoding=UTF8&asin=&isAmazonFulfilled=1&isCBA=&marketplaceID=A1PA6795UKMFR9&orderID=&seller=A2PGPJL0BBLHLX&tab=&vasStoreID=", False
.setRequestHeader "User-Agent", "Mozilla/5.0"
.send
html.body.innerHTML = .responseText
End With

Set post = html.querySelectorAll(".a-list-item .a-text-bold")

Dim i As Long, sibling As Object, val As Variant

For i = 0 To post.Length - 1
Set sibling = post.item(i).NextSibling

Debug.Print post.item(i).innerText, " = "

Select Case sibling.NodeType
Case 3
val = sibling.NodeValue
Case 1
val = sibling.innerText
End Select

Debug.Print val
Next

Stop

End Sub

Can MSXML2.XMLHTTP retrieve ALL of the HTML data for a given webpage?

Why Json? Because the page is loaded using json data

To View: Use Google Chrome --> Press F12 --> Load URL --> Goto Network tab

Sample Image


Code:

Sub getHTTP()

Dim Url As String, data As String
Dim xml As Object, JSON As Object, colObj, item

Url = "https://www.tiff.net/data/films-events-2018.json?q=1513263947586"

Set xml = CreateObject("MSXML2.ServerXMLHTTP")
With xml
.Open "GET", Url, False
.send
data = .responseText
End With

Set JSON = JsonConverter.ParseJson(data)
Set colObj = JSON("items")

For Each item In colObj
Debug.Print item("title")
Debug.Print item("description")

For Each c1 In item("cast")
Debug.Print c1
Next

For Each c2 In item("countries")
Debug.Print c2
Next
Next
End Sub

Output

Sample Image


Installation of JsonConverter

  1. Download the latest release
  2. Import JsonConverter.bas into your project (Open VBA Editor, Alt + F11; File > Import File)
    Add Dictionary reference/class
  3. For Windows-only, include a reference to "Microsoft Scripting Runtime"
  4. For Windows and Mac, include VBA-Dictionary

Tree View of Data

Sample Image



Related Topics



Leave a reply



Submit