Xpath to Find All Following Siblings Up Until the Next Sibling of a Particular Type

Xpath to extract from following siblings until the next specified node

If you select p[preceding-sibling::h2[1][contains(., 'Summary')] you will select all p children of the context node which have (the or a) h2 containing Summary as the immediately preceding h2 sibling.

If you want all such elements (e.g. the ul too) then use *[not(self::h2)][preceding-sibling::h2[1][contains(., 'Summary')].

Or you could try .//h2[contains(., 'Summary')]/following-sibling::*[preceding-sibling::h2[1][contains(., 'Summary')]].

XPath to find all following siblings up until the next sibling of a particular type

One possible solution:

dl.xpath('dt').each_with_index do |dt, i|
  dds = dt.xpath("following-sibling::dd[not(../dt[#{i + 2}]) or " +
                     "following-sibling::dt[1]=../dt[#{i + 2}]]")
  puts "#{dt.text}: #{dds.map(&:text).join(', ')}"
end

This relies on a value comparison of dt elements and will fail when there are duplicates. The following (much more complicated) expression does not depend on unique dt values:

following-sibling::dd[not(../dt[$n]) or 
    (following-sibling::dt[1] and count(following-sibling::dt[1]|../dt[$n])=1)]

Note: Your use of self fails because you're not properly using it as an axis (self::). Also, self always contains just the context node, so it would refer to each dd inspected by the expression, not back to the original dt

XPath : select all following siblings until another sibling

You could do it this way:


../node[not(text()) and preceding-sibling::node[@id][1][@id='1']]

where '1' is the id of the current node (generate the expression dynamically).

The expression says:

from the current context go to the parent
select those child nodes that
have no text and
from all "preceding sibling nodes that have an id" the first one must have an id of 1

If you are in XSLT you can select from the following-sibling axis because you can use the current() function:

<!-- the for-each is merely to switch the current node -->
<xsl:for-each select="node[@id='1']">
  <xsl:copy-of select="
    following-sibling::node[
      not(text()) and
      generate-id(preceding-sibling::node[@id][1])
      =
      generate-id(current())
    ]
  " />
</xsl:for-each>

or simpler (and more efficient) with a key:

<xsl:key 
  name="kNode" 
  match="node[not(text())]" 
  use="generate-id(preceding-sibling::node[@id][1])"
/>

<xsl:copy-of select="key('kNode', generate-id(node[@id='1']))" />

XPath: Select following siblings until certain class

Good question!

The following expression will give you 1..2, 3..5 or 6..7, depending on input X + 1, where X is the set you want (2 gives 1-2, 3 gives 3-.5 etc). In the example, I select the third set, hence it has [4]:

/table/tr[1]
  /td[not(@class = 'foo')]
  [
     generate-id(../td[@class='foo'][4]) 
     = generate-id(
         preceding-sibling::td[@class='foo'][1]
        /following-sibling::td[@class='foo'][1])
  ]

The beauty of this expression (imnsho) is that you can index by the given set (as opposed to index by relative position) and that is has only one place where you need to update the expression. If you want the sixth set, just type [7].

This expression works for any situation where you have siblings where you need the siblings between any two nodes of the same requirement (@class = 'foo'). I'll update with an explanation.

Replace the [4] in the expression with whatever set you need, plus 1. In oXygen, the above expression shows me the following selection:

Sample Image

Explanation

/table/tr[1]

Selects the first tr.

/td[not(@class = 'foo')]

Selects any td not foo

generate-id(../td[@class='foo'][4])

Gets the identity of the xth foo, in this case, this selects empty, and returns empty. In all other cases, it will return the identity of the next foo that we are interested in.

generate-id(
    preceding-sibling::td[@class='foo'][1]
    /following-sibling::td[@class='foo'][1])

Gets the identity of the first previous foo (counting backward from any non-foo element) and from there, the first following foo. In the case of node 7, this returns the identity of nothingness, resulting in true for our example case of [4]. In the case of node 3, this will result in c, which is not equal to nothingness, resulting in false.

If the example would have value [2], this last bit would return node b for nodes 1 and 2, which is equal to the identity of ../td[@class='foo'][2], returning true. For nodes 4 and 7 etc, this will return false.

Update, alternative #1

We can replace the generate-id function with a count-preceding-sibling function. Since the count of the siblings before the two foo nodes is different for each, this works as an alternative for generate-id.

By now it starts to grow just as wieldy as GSerg's answer, though:

/table/tr[1]
  /td[not(@class = 'foo')]
  [
     count(../td[@class='foo'][4]/preceding-sibling::*) 
     = count(
         preceding-sibling::td[@class='foo'][1]
        /following-sibling::td[@class='foo'][1]/preceding-sibling::*)
  ]

The same "indexing" method applies. Where I write [4] above, replace it with the n^th + 1 of the intersection position you are interested in.

Xpath following siblings until another sibling

I may be misinterpreting the question, but if, for each <tr><td class="DT">xx-xx-xx</td>, you want all <tr> after it, and before the next <tr><td class="DT">xx-xx-xx</td>, one pattern is to loop on these "boundary" <tr><td class="DT">xx-xx-xx</td> elements, and selecting following sibling rows with a condition on how many "boundaries" are found before.

Let's use lxml to illustrate. First, we create a document from your sample input:

>>> import lxml.html
>>> t = '''<table>
...     <tr>
...         <td class="DT">29-04-14</td>
...         <td class="Regio">Text</td>
...         <td class="Md">Text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="SomeClass">Some other text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="SomeOtherClass">Some more text</td>
...     </tr>
...     <tr>
...         <td class="DT">22-04-14</td>
...         <td class="Regio">Text</td>
...         <td class="Md">Text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="OmsAm">more text</td>
...     </tr>
...     <tr>
...         <td class="DT">30-04-14</td>
...         <td class="Regio">Text</td>
...         <td class="Md">Text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="OmsBr">Some other Text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="OmsBr">More Text</td>
...     </tr>
...     <tr>
...         <td></td>
...         <td></td>
...         <td class="OmsBr">Some different text</td>
...     </tr>
... </table>'''
>>> doc = lxml.html.fromstring(t)

Now, let's count these <tr><td class="DT">xx-xx-xx</td>:

>>> doc.xpath('//table/tr[td/@class="DT"]')
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab005e8>, <Element tr at 0x7f948ab00638>]
>>> doc.xpath('count(//table/tr[td/@class="DT"])')
3.0
>>> list(enumerate(doc.xpath('//table/tr[td/@class="DT"]'), start=1))
[(1, <Element tr at 0x7f948ab00548>), (2, <Element tr at 0x7f948ab005e8>), (3, <Element tr at 0x7f948ab00638>)]

We can loop on these rows and select the rows that come after in the document (we'll select text nodes to "see" which row these are:

>>> for cnt, row in enumerate(doc.xpath('//table/tr[td/@class="DT"]'), start=1):
...     print( row.xpath('./following-sibling::tr/td/text()') )
... 
['Some other text', 'Some more text', '22-04-14', 'Text', 'Text', 'more text', '30-04-14', 'Text', 'Text', 'Some other Text', 'More Text', 'Some different text']
['more text', '30-04-14', 'Text', 'Text', 'Some other Text', 'More Text', 'Some different text']
['Some other Text', 'More Text', 'Some different text']

We're selecting too many rows in each iteration, all the rows until the end of the <table>. We need an additional "end" condition for following rows.

We're counting the tr[td/@class="DT"] in the loop, so we can check how many preceding tr[td/@class="DT"] each row has:

For the 1st set:

row.xpath('./following-sibling::tr[count(./preceding-sibling::tr[td/@class="DT"])=1]

For the 2nd:

row.xpath('./following-sibling::tr[count(./preceding-sibling::tr[td/@class="DT"])=2]

etc.

So, in the loop, we can use the current count with an XPath variable with lxml (an underrated XPath feature supported by lxml):

>>> for cnt, row in enumerate(doc.xpath('//table/tr[td/@class="DT"]'), start=1):
...     print( row.xpath('./following-sibling::tr[count(./preceding-sibling::tr[td/@class="DT"])=$count]', count=cnt) )
... 
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab005e8>, <Element tr at 0x7f948ec02f98>]
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab00638>]
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab005e8>, <Element tr at 0x7f948ab00688>]
>>>

Hm, we're selecting 1 row too much in each iteration.

That's because <tr><td class="DT">30-04-14</td> also has 1 preceding <tr><td class="DT">

We can add an extra predicate for selecting rows that do NOT have a <td class="DT">

>>> for cnt, row in enumerate(doc.xpath('//table/tr[td/@class="DT"]'), start=1):
...     print( row.xpath('''
...         ./following-sibling::tr[count(./preceding-sibling::tr[td/@class="DT"])=$count]
...                                [not(td/@class="DT")]''', count=cnt) )
... 
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab005e8>]
[<Element tr at 0x7f948ab00548>]
[<Element tr at 0x7f948ab00548>, <Element tr at 0x7f948ab005e8>, <Element tr at 0x7f948ab00688>]
>>>

The number of results per iteration looks right.
Let's finally check using text nodes:

>>> for cnt, row in enumerate(doc.xpath('//table/tr[td/@class="DT"]'), start=1):
...     print( row.xpath('''
...         ./following-sibling::tr[count(./preceding-sibling::tr[td/@class="DT"])=$count]
...                                [not(td/@class="DT")]
...             /td/text()''', count=cnt) )
... 
['Some other text', 'Some more text']
['more text']
['Some other Text', 'More Text', 'Some different text']
>>>

How to find all immediately adjacent siblings with XPath

The simplest one I can find was this:

//start/following-sibling::a intersect //start/following-sibling::*[name()!='a'][1]/preceding-sibling::a

What this does is:

Take all the a siblings following start: //start/following-sibling::a. (Result: a2, a3, a4.) Set this to one side for now.
Then take the first non-a sibling following start: //start/following-sibling::*[name()!='a'][1] (Result: b.)
And find all the a nodes that precede it: /preceding-sibling::a. (Result: a1, a2, a3)
Take the intersection of 1 and 3. (Result: a2, a3)

Update: Another way to phrase it is //start/following-sibling::*[name()!='a'][1]/preceding-sibling::a[preceding-sibling::start], this roughly translates to: take the first non-a sibling following start, count backwards but only choose elements that are still preceded by start.

Update 2: If you know that b will always be called b, you can of course replace the rather hard to read following-sibling::*[name()!='a'][1] part with following-sibling::b[1].

How to select following siblings until a certain sibling

For every row that has a Record_type of 512, create a Header element.

In order to find the row elements for the relevant group of Line elements, you want to select the row elements that are following-sibling from the 512 who's Record_type = 513 and who's first preceding-sibling is the current header.

for $header in $doc/root/row[Record_type = 512]
let $lines := $header/following-sibling::row[Record_type = 513]
                [preceding-sibling::row[Record_type = 512][1] = $header] 
return 
  <Header>{ 
    $header/*,
    for $line in $lines
    return <Line>{ $line/* }</Line>
  }</Header>