Converting HTML to XML
I was successful using tidy
command line utility. On linux I installed it quickly with apt-get install tidy
. Then the command:
tidy -q -asxml --numeric-entities yes source.html >file.xml
gave an xml file, which I was able to process with xslt processor. However I needed to set up xhtml1 dtds correctly.
This is their homepage: html-tidy.org (and the legacy one: HTML Tidy)
Parse HTML into Clean XML
Parsing the fake Excel / HTML input may have some issues:
- HTML is not well-formed.
- HTML Entities like
Assuming your HTML example above takes care of the first issue, you can brute force the second issue by decoding the input like this:
[xml]$html = [System.Net.WebUtility]::HtmlDecode(@'
<table class="c41">
<tr class="c5">
<td valign="top" class="c6"><p class="c7"><span class="c8">Cash Activity </span>
</p>
</td>
<td valign="top" class="c9"><p class="c10"><br/><span class="c2">FRIDAY </span><br/><span class="c2"> </span></p>
</td>
</tr>
<tr class="c5">
<td valign="top" class="c6"><p class="c11"><br/></p>
</td>
<td valign="top" class="c9"><p class="c10"><br/><span class="c2">05-JAN-18</span><br/><span class="c2"> </span></p>
</td>
</tr>
<tr class="c12">
<td valign="top" class="c13"><p class="c7"><span class="c14">Prior Day Available Balance</span></p>
</td>
<td valign="top" class="c15"><p class="c10"><span class="c16">6,472,679.45
</span></p>
</td>
</tr>
</table>
'@);
Now it's just a matter of some simple XPath to select the nodes you want to get the desired XML you specified above (tested and working):
$xml = @'
<?xml version="1.0" encoding="utf-8" ?>
<Cash Activities>
'@;
$rows = $html.DocumentElement.SelectNodes('//tr');
foreach ($row in $rows) {
if ($row.GetAttribute('class') -eq 'c12') {
$xml += "`t<Cash Activity>`n";
$spans = $row.SelectNodes('.//descendant::span[@class]');
if ($spans.Count -eq 2) {
$xml += "`t`t<Activity>$($spans[0].InnerText.Trim())</Activity>`n";
$xml += "`t`t<Balance>$($spans[1].InnerText.Trim())</Balance>`n";
}
$xml += "`t</Cash Activity>`n";
}
}
$xml += @'
</Cash Activities>
'@;
Convert html to xml using java
Try jTidy
JTidy can be used as a tool for cleaning up malformed and faulty HTML
How can I convert HTML to XML (which conforms with XML schema or DTD)
Tidy can convert HTML to XHTML (the same structure of elements and attributes but meeting the rules for XML well-formedness), but it can't convert it to meet the requirements of some arbitrary DTD.
You'll need to write an explicit mapping between the two data formats for that. XSLT is a popular language for doing that.
Convert html list to xml through jQuery
Here is best and easy concept to convert html to xml
.. no need to use val().
instead of val() you need to use .text() || .html()
working example as below
$('#go').click(function() { var xml = '<List>'; $("ul#list li").each(function(){ var name = $(this).children('.name-block').text(); var value = $(this).children(".value-block").text(); if(name && value){ xml += "<Item>\n"; xml += "<Name>" + name + "</Name>\n"; xml += "<Value>" + value + "</Value>\n"; xml += "</Item>\n"; } }); xml += "</list>" $('.modal-body').append(xml); $("#myModal").modal('show'); console.log(xml) })
<!DOCTYPE html> <html> <head> <title></title> <script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script><link href="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.2/css/bootstrap.min.css" rel="stylesheet" /><script src="https://maxcdn.bootstrapcdn.com/bootstrap/3.3.2/js/bootstrap.min.js"></script>
</head> <body> <div class="form-group"> <div class="col-sm-offset-6 col-sm-3"> <button type="button" id="go" class="btn btn-primary">Open XML Modal Box</button> </div> </div><!--Modal if input is empty--><div class="modal fade" id="myModal"> <div class="modal-dialog"> <div class="modal-content"> <div class="modal-header"> <button type="button" class="close" data-dismiss="modal" aria-label="Close"><span aria-hidden="true">×</span>
</button> <h4 class="modal-title">you xml value printed below</h4>
</div> <div class="modal-body"> </div> <div class="modal-footer"> <button type="button" class="btn btn-default" data-dismiss="modal">Close</button> </div> </div> <!-- /.modal-content --> </div> <!-- /.modal-dialog --></div>
<ul id="list"> <li> <span class="name name-block">Hello</span><span>=</span><span class="name value-block">World</span> <span class="btn delete">Delete</span> </li> <li> <span class="name name-block">Happy</span><span>=</span><span class="name value-block">Coding</span> <span class="btn delete">Delete</span> </li> </ul>
<!-- /.modal --> <!--End Modal--> </body> </html>
HTML to XML conversion using XSLT 2.0
You can try this:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="body">
<book>
<xsl:for-each-group select="p" group-starting-with="p[@class='h1']">
<sectionA>
<title>
<xsl:value-of select="node()"/>
</title>
<xsl:for-each-group select="current-group() except ." group-starting-with="p[@class='h2']">
<xsl:choose>
<xsl:when test="self::p[@class='h2']">
<sectionB>
<title>
<xsl:value-of select="node()"/>
</title>
<xsl:for-each-group select="current-group() except ." group-starting-with="p[@class='h3']">
<xsl:choose>
<xsl:when test="self::p[@class='h3']">
<sectionC>
<title>
<xsl:value-of select="node()"/>
</title>
<xsl:apply-templates select="current-group() except ."></xsl:apply-templates>
</sectionC>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"></xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</sectionB>
</xsl:when>
<xsl:otherwise>
<xsl:apply-templates select="current-group()"></xsl:apply-templates>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each-group>
</sectionA>
</xsl:for-each-group>
</book>
</xsl:template>
<xsl:template match="p">
<xsl:copy>
<xsl:copy-of select="@*"/>
<xsl:apply-templates select="node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet> <!-- added by edit -->
How to convert HTML to XML, Import parsed XML to google sheets
Issue:
"Looking for guidance on setting the DOCTYPE, I do not have access to change the body of the emails."
[Document: No DOCTYPE declaration, Root is [Element: ]]
If you want to declare a DocType for the newly created Xml document you can do so by using the appropriate method(s) ( setDocType(), createDocType() )
Once you declare the DocType though, you'll still have to work on your parsing because all you will append with the current code is the same string, but now with a DocType declared ;)
Here's a working example:
function createXml() {
// Get sheet
var ss = SpreadsheetApp.getActiveSheet();
// Create Xml document with root element "threads"
var doc = XmlService.createDocument(XmlService.createElement('threads'));
// Declare DocType "whatever-you-like" for the Xml document
doc.setDocType(XmlService.createDocType('threads'));
// Get the root element
var root = doc.getRootElement();
// Get some threads from Gmail
var threads = GmailApp.getStarredThreads();
// For each thread
for (var i = 0; i < threads.length; i++) {
// Get messages
var messages = threads[i].getMessages();
// And for each message
for (var j=0; j<messages.length; j++){
// Get the plain html body
var msg = messages[j].getPlainBody();
// Create a child element "thread"
var child = XmlService.createElement('thread')
// Set "messageCount" attr
.setAttribute('messageCount', threads[i].getMessageCount())
// Set "isUnread" attr
.setAttribute('isUnread', threads[i].isUnread())
// Set text attr
.setText(threads[i].getFirstMessageSubject()+msg);
// Add the child element to root
root.addContent(child);
}
// Get prettyfied xml document
var xml = XmlService.getPrettyFormat().format(doc);
// Log the prettyfied xml doc
Logger.log(xml);
// Create list of parsed children to append to sheet row
var parsed = [];
// Add the parsed text from children elements of root
XmlService.parse(xml).getRootElement().getChildren().forEach((child) => {parsed.push(child.getText())});
// Append the row with parsed data
ss.appendRow(parsed)
}
}
Related Topics
How to Remove the Gutter (Spacing) Between Columns in Bootstrap
How to Specify a Local File Within HTML Using the File: Scheme
What Is It When a Link Has a Pound "#" Sign in It
Why Is There a Vertical Scroll Bar If Parent and Child Have the Same Height
Accessing Object in Iframe Using Vba
How to Display the Checkbox Over the Images For Selection
Space Between Nowrap Inline Blocks
Curved Div With Transparent Top
How to Access a Mobile'S Camera from a Web App
Align Image in Center and Middle Within Div
How to Prevent Input Type="Number" Getting Negative Values
Ng-App Vs. Data-Ng-App, What Is the Difference
How to Change Scroll Bar Position with CSS
Bootstrap - Align Button to the Bottom of Card
How to Use Text-Overflow:Ellipsis on Multiline Text
Using Base Tag on a Page That Contains Svg Marker Elements Fails to Render Marker
Why Display Grid with 100% in Grid-Template-Columns Goes Out of Body