How do I get an HTML comment with javascript
Using a NodeIterator (IE >= 9)
The best method is to use a dedicated NodeIterator instance iterating all comments contained in a given root element.
See it in action!
function filterNone() {
return NodeFilter.FILTER_ACCEPT;
}
function getAllComments(rootElem) {
var comments = [];
// Fourth argument, which is actually obsolete according to the DOM4 standard, is required in IE 11
var iterator = document.createNodeIterator(rootElem, NodeFilter.SHOW_COMMENT, filterNone, false);
var curNode;
while (curNode = iterator.nextNode()) {
comments.push(curNode.nodeValue);
}
return comments;
}
window.addEventListener("load", function() {
console.log(getAllComments(document.body));
});
Using a custom-made DOM traversal (to support IE < 9 as well)
If you have to support older browsers (e.g. IE <9), you need to traverse the DOM yourself and extract those elements whose node type is Node.COMMENT_NODE
.
See it in action!
// Thanks to Yoshi for the hint!
// Polyfill for IE < 9
if (!Node) {
var Node = {};
}
if (!Node.COMMENT_NODE) {
// numeric value according to the DOM spec
Node.COMMENT_NODE = 8;
}
function getComments(elem) {
var children = elem.childNodes;
var comments = [];
for (var i=0, len=children.length; i<len; i++) {
if (children[i].nodeType == Node.COMMENT_NODE) {
comments.push(children[i]);
}
}
return comments;
}
Extracting a node's contents and delete it
Independent of the way you choose from above, you receive the same node DOM objects.
Accessing a comment's contents is as easy as commentObject.nodeValue
.
Deleting a comment is a bit more verbose: commentObject.parentNode.removeChild(commentObject)
Extracting Comments from HTML code using Xpath
library(rvest)
library(tidyverse)
pg <- read_html("http://agriexchange.apeda.gov.in/ExportersDirectory/exporters_list.aspx?letter=Z")
html_nodes(pg, xpath=".//comment()[contains(., 'IE CODE')]/../../..") %>% # target the comment then back up to the table
map_df(~{
# extract the <td> (column 1)
html_nodes(.x, xpath=".//td[1]") %>%
html_text(trim=TRUE) %>%
str_replace_all("[[:space:]]+", " ") -> tmp
# add in the comment to the "missing" <td> value
html_node(.x, xpath=".//comment()") %>%
html_text() %>%
stri_replace_all_regex("<b>|</b>", "") -> tmp[1]
# set it up for data frame-ing
set_names(as.list(tmp), sprintf("X%s", 1:8))
})
## # A tibble: 196 x 8
## X1 X2 X3
## <chr> <chr> <chr>
## 1 IE CODE : 0514026049 Z A M PRODUCTS 54 DAROOD GRAN SHAHPEER GATE MEERUT
## 2 IE CODE : AQDPV0923E Z CONNECT H-302, AIRFORCE NAVAL, ATHIPALAYAM PIRIVU, GANAPATHY, COIMBATORE
## 3 IE CODE : 2912000459 Z K INTERNATIONAL MUGHALPURA IST NEAR ISMAIL BEG KI MASJID MORADABAD
## 4 IE CODE : 0307069753 Z K R INTERNATIONAL CO. 4084, PLAZA SHOPPING CENTRE,104/142, SHERIF DEVJI STREET, MUMBAI,
## 5 IE CODE : 3117507531 Z S ENTERPRISES SURVEY NO 12,PLOT NO.64,FLAT NO 1, KAUSARBAUGH NIBM ROAD KONDHWA KHURD PUNE
## 6 IE CODE : 0500009503 Z. EXPORTS T-283, NEAR GURUDWARA BHAIJI B AHATA KIDARA,
## 7 IE CODE : 0713030658 Z. K. MANGO MANDI APMC YARD, RMC CHANNAPATNA, RAMANAGARA DISTRICT
## 8 IE CODE : 0599037351 Z.A. CRAFTS, A-56, GALI NO. 6, CHOUHAN BANGER, NEW SEELAM PUR, DELHI
## 9 IE CODE : 0609001353 Z.B.INTERNATIONAL 1ST FLOOR,25TH MILE STONE,AGRA MATHURA ROAD,VILL CHUMURA, POST-FARAH MATHURA
## 10 IE CODE : 0501009256 Z.D. EXPORTS J-51, EXTENSION, STREET NO. 12/3, RAMESH PARK, LAXMI NAGAR DELHI
## # ... with 186 more rows, and 5 more variables: X4 <chr>, X5 <chr>, X6 <chr>, X7 <chr>, X8 <chr>
Is it possible to find all comments in a HTML document and wrap them in a div?
try this, it grabs all comments in html file. if there is no other unrelated comments to your plan, this can do the job.
$(function() {
$("*").contents().filter(function(){
return this.nodeType == 8;
}).each(function(i, e){
alert(e.nodeValue);
});
});
How can I read a HTML comment located outside of html using JavaScript?
I did some testing on this and discovered that any HTML beyond </html>
is appended to the <body>
during parsing. So, in all browsers I tested (IE6/7, FF3, Opera10, Chrome2), the comment was accessible as document.body.lastChild
and its contents can be retreived via document.body.lastChild.data
.
Nokogiri query for HTML comments contained in JavaScript?
If you parse the document as XML, it will find the comment. However, if you parse it as HTML, Nokogiri will put the entire contents of the script tag into a cdata section. You could then parse it out.
require 'rubygems'
require 'nokogiri'
body = DATA.read
doc = Nokogiri::XML(body)
puts doc.search('/html/head/script/comment()').text.strip
# puts "url = 'http://someurl.com';"
doc = Nokogiri::HTML(body)
puts doc.search('/html/head/script').text.strip
# puts "<!--\n url = 'http://someurl.com';\n -->"
__END__
<html>
<head>
<script language="JavaScript" type="text/javascript">
<!--
url = 'http://someurl.com';
-->
</script>
</head>
</html>
How to remove all content between two HTML comments using BeautifulSoup
First of all, be careful with snippets of HTML taken out of context. If you print your soupified snippet, you'll get:
<!-- Top Plans & Programs: Most Common User Phrases - List Bucket 6 -->
<html>
<body>
<div>
<span id="company">
...
Whoops--BS added the comment above the <html>
tag, pretty clearly not your intent as an algorithm to remove elements between the two tags would inevitably remove the entire document (that's why including your code is important...).
As for the main task, element.decompose()
or element.extract()
will remove it from the tree (extract()
returns it, minor subtlety). Elements to be removed in a walk need to be kept in a separate list and removed after the traversal ends.
from bs4 import BeautifulSoup, Comment
html = """
<body>
<!-- Top Plans & Programs: Most Common User Phrases - List Bucket 6 -->
<div><span id="company">Apple</span> Chats:</div>
<div>abcdefg<span>xvfdadsad</span>sdfsdfsdf</div>
<div>
<li>(<span>7</span>sadsafasf<span>vdvdsfdsfds</span></li>
<li>(<span>8</span>) <span>Reim</span></li>
</div>
<!-- Ad -->
<a href="#">
"""
start_comment = " Top Plans & Programs: Most Common User Phrases - List Bucket 6 "
end_comment = " Ad "
soup = BeautifulSoup(html, "lxml")
to_extract = []
between_comments = False
for x in soup.recursiveChildGenerator():
if between_comments and not isinstance(x, str):
to_extract.append(x)
if isinstance(x, Comment):
if start_comment == x:
between_comments = True
elif end_comment == x:
break
for x in to_extract:
x.decompose()
print(soup.prettify())
Output:
<html>
<body>
<!-- Top Plans & Programs: Most Common User Phrases - List Bucket 6 -->
<!-- Ad -->
<a href="#">
</a>
</body>
</html>
Note that if the ending comment isn't at the same level as the starting comment, this will destroy all parent elements of the ending comment. If you don't want that, you'll need to walk back up the parent chain until you reach the level of the starting comment.
Another solution using .find
and .next
(same imports/HTML string/output as above):
start_comment = " Top Plans & Programs: Most Common User Phrases - List Bucket 6 "
end_comment = " Ad "
soup = BeautifulSoup(html, "lxml")
el = soup.find(text=lambda x: isinstance(x, Comment) and start_comment == x)
end = el.find_next(text=lambda x: isinstance(x, Comment) and end_comment == x)
to_extract = []
while el and end and el is not end:
if not isinstance(el, str):
to_extract.append(el)
el = el.next
for x in to_extract:
x.decompose()
print(soup.prettify())
Extracting comments from html using Jsoup
I found a way to remove the comments using Jsoup at: https://gist.github.com/jhy/491407
If you look at this code, probably you will be able to prepare extractComments method. I tried to implement this functionality and came up with this:
private List<Comment> getComments(Node node) {
List<Comment> comments = new ArrayList<Comment>();
int i = 0;
while (i < node.childNodes().size()) {
Node child = node.childNode(i);
if (child.nodeName().equals("#comment"))
comments.add((Comment) child);
else {
comments.addAll(getComments(child));
}
i++;
}
return comments;
}
Example usage:
String page = "...."; //your page body
Document doc = Jsoup.parse(page);
List<Comment> comments = getComments(doc);
Related Topics
Pdo Were Rows Affected During Execute Statement
How to Run the Bind_Param() Statement in PHP
Printing Content of a Xml File Using Xml Dom
Get the Values of 2 HTML Input Tags Having the Same Name Using PHP
How to Use PHP to Get the Current Year
How to See the Extensions Loaded by PHP
When to Use a Class VS. Function in PHP
How to Keep Whitespace Formatting Using PHP/Html
Tcpdf Error :Unable to Get the Size of the Image
How to I Send Data from JavaScript to PHP and Vice Versa
Execute PHP Script Before Every PHP Script
Increase the Limit of File Upload Size in Heroku While Uploading to Dropbox