Convert HTML to Plain Text in Vba

Convert html to plain text in VBA

Set a reference to "Microsoft HTML object library".

Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function

Tim

Decode HTML entities into plain text

You could create an HTMLDocument object, store the HTML in it, and get the text version of it out of it:

Function HtmlDecode(str)
Dim dom

Set dom = CreateObject("htmlfile")
dom.Open
dom.Write str
dom.Close
HtmlDecode = dom.body.innerText
End Function

decoded = HtmlDecode("±") ' = "±"

Convert html to plain text in VBA

Set a reference to "Microsoft HTML object library".

Function HtmlToText(sHTML) As String
Dim oDoc As HTMLDocument
Set oDoc = New HTMLDocument
oDoc.body.innerHTML = sHTML
HtmlToText = oDoc.body.innerText
End Function

Tim

how to convert a column of text with html tags to formatted text in vba in excel

Your code is working just for the first line because you are getting and setting only the first line :

'get the A1 cell value
.document.body.InnerHTML = Sheets("Sheet1").Range("A1").Value
'set the B1 cell value
ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("B1")

To apply your code for all the lines you have to execute it inside a loop.

So your code becomes :

Sub Sample()

Dim Ie As Object

'get the last row filled
lastRow = Sheets("Sheet1").Range("A" & Sheets("Sheet1").Rows.Count).End(xlUp).Row
'loop to apply the code for all the lines filled
For Row = 1 To lastRow
Set Ie = CreateObject("InternetExplorer.Application")
With Ie
.Visible = False
.Navigate "about:blank"
.document.body.InnerHTML = Sheets("Sheet1").Range("A" & Row).Value
'update to the cell that contains HTML you want converted
.ExecWB 17, 0
'Select all contents in browser
.ExecWB 12, 2
'Copy them
ActiveSheet.Paste Destination:=Sheets("Sheet1").Range("B" & Row)
'update to cell you want converted HTML pasted in
.Quit
End With
Set Ie = Nothing
Next

End Sub

VBA ms word macro: convert an embedded HTML link into plain text

This will print the HTML tags as you specify for all of the links in your document.

Dim hlink As Hyperlink
Dim htmlLink As String

For Each hlink In ThisDocument.Hyperlinks
With hlink
htmlLink = "<a target=""_blank"" href=""" & .Address & """>" & _
.TextToDisplay & "</a>"

Debug.Print htmlLink
End With
Next hlink

Of course, you'll want to do something more useful with them than just print them in the Immediate window.

As an aside, I prefer to use DuckDuckGo in my examples due its much better privacy policy than Google's...

How do you convert Html to plain text?

If you are talking about tag stripping, it is relatively straight forward if you don't have to worry about things like <script> tags. If all you need to do is display the text without the tags you can accomplish that with a regular expression:

<[^>]*>

If you do have to worry about <script> tags and the like then you'll need something a bit more powerful then regular expressions because you need to track state, omething more like a Context Free Grammar (CFG). Althought you might be able to accomplish it with 'Left To Right' or non-greedy matching.

If you can use regular expressions there are many web pages out there with good info:

  • http://weblogs.asp.net/rosherove/archive/2003/05/13/6963.aspx
  • http://www.google.com/search?hl=en&q=html+tag+stripping+&btnG=Search

If you need the more complex behaviour of a CFG I would suggest using a third party tool, unfortunately I don't know of a good one to recommend.



Related Topics



Leave a reply



Submit