XPath to select between two HTML comments?
I would look for elements that are preceded by the first comment and followed by the second comment:
doc.xpath("//*[preceding::comment()[. = ' begin content ']]
[following::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=> <p>Some more elements</p>
#=> </div>
#=> <p>Some more elements</p>
Note that the above gives you each element in between. This means that if you iterate through each the returned nodes, you will get some duplicated nested nodes - eg the "Some more elements".
I think you might actually want to just get the top-level nodes in between - ie the siblings of the comments. This can be done using the preceding/following-sibling
instead.
doc.xpath("//*[preceding-sibling::comment()[. = ' begin content ']]
[following-sibling::comment()[. = ' end content ']]")
#=> <div>some text</div>
#=> <div>
#=> <p>Some more elements</p>
#=> </div>
Update - Including comments
Using //*
only returns element nodes, which does not include comments (and some others). You could change *
to node()
to return everything.
puts doc.xpath("//node()[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#=>
#=> <!--keywords1: first_keyword-->
#=>
#=> <div>html</div>
#=>
If you just want element nodes and comments (ie not everything), you can use the self
axis:
doc.xpath("//node()[self::* or self::comment()]
[preceding-sibling::comment()[. = 'begin content']]
[following-sibling::comment()[. = 'end content']]")
#~ #=> <!--keywords1: first_keyword-->
#~ #=> <div>html</div>
XPath selecting text between comments
You can get required output with below XPath expressions
//p/text()[string-length(.)>0] # for date
//p/a/@href # for link
//p/a/text() # for link text
If you still want to use those comments in XPath:
//p/text()[preceding-sibling::comment()[1][contains(., 'start template: articleLists/indexHeadline.html')]]
[following-sibling::comment()[1][contains(., 'end template: articleLists/indexHeadline.html')]] # for date
//p/a[preceding-sibling::comment()[1][contains(., 'start template: articleLists/indexHeadline.html')]]
[following-sibling::comment()[1][contains(., 'end template: articleLists/indexHeadline.html')]]/@href # for links
//p/a[preceding-sibling::comment()[1][contains(., 'start template: articleLists/indexHeadline.html')]]
[following-sibling::comment()[1][contains(., 'end template: articleLists/indexHeadline.html')]]/text() # for link text
XPath selecting between comments multiple times
Add a predicate that states that you want the first preceding comment and the first following comment.
So, for example, to get the contents between the comments that starts with "comment 1":
//*[preceding-sibling::comment()[1][contains(., 'comment 1')]]
[following-sibling::comment()[1][contains(., 'end content')]]
Similarly, to get the contents between the comments that starts with "comment 2":
//*[preceding-sibling::comment()[1][contains(., 'comment 2')]]
[following-sibling::comment()[1][contains(., 'end content')]]
XPath - extracting text between two nodes
You should be able to just test the first preceding sibling h5
...
//text()[preceding-sibling::h5[1][normalize-space()='SecondHeader']]
XPATH substring before and after to return text between two html tags
You want the nodes that directly follow a particular <h4>
, where "directly follow" can be expressed as "the first preceding <h4>
is the one we started at" (well, and the node in question is not an <h4>
itself, of course).
This expression(*)
//h4[. = 'Start here']/following-sibling::*[not(self::h4) and preceding-sibling::h4[1][. = 'Start here']]
selects from this document
<body>
<h4>Not relevant</h4>
<p>Other stuff</p>
<h4>Start here</h4>
<p>Text stuff 1</p>
<p>Text stuff 2</p>
<h4>Stop here</h4>
<p>Other stuff</p>
</body>
these nodes
<p>Text stuff 1</p>
<p>Text stuff 2</p>
You can extract/join their text values in the host application.
(*) could also be written as //*[not(self::h4) and preceding-sibling::h4[1][. = 'Start here']]
, but that one has to check more nodes, i.e. all nodes in the document, as opposed to only the following-sibling axis of one particular node.
Xpath to select commented code
Use the union operator "|":
descendant-or-self::link|/*/comment()
This will return the text of the comment, which cannot be used for further parsing even though it contains markup-like text. It's just a string so you'll have to treat it like one.
Xpath. How to select all text between two tags?
<a>
is a sibling of <pre>
, not the text(). You can use preceding::a
instead (and similarly for following
).
Related Topics
How to Use Webp Images and Support Safari
How to Center a (Background) Image Within a Div
Chrome Does Not Expand Flex Parent According to Children's Content
Hide Text, But Have It Show Up If Copied and Pasted Without JavaScript
How to Always Show Up/Down Arrows for Input "Number"
How to Make Firefox Auto-Refresh on File Change
Alternate Background Colors for List Items
Font-Awesome Icons Not Rendering via the Boostrapcdn
Why Do The CSS Width and Height Properties Not Adjust for Padding
HTML Float Right Element Order
Equal Width Columns in CSS Grid
Use Excel Vba to Click on a Button in Internet Explorer, When The Button Has No "Name" Associated
How to Set the Space Between Lines in a Div Without Setting Line Height
How to Vertical Center Text Next to an Image in HTML/CSS
CSS Make Div Position Fixed Inside Div with Perspective Propertie