Extracting Content in :After Using Xpath

Extracting content in :after using XPath

You are using By.xpath but i[@class='tree-branch-head']::after is not a valid XPath, it is a mixture of XPath notation (i[@class='tree-branch-head']) and CSS (:after).

You should use By.cssSelector and a valid CSS selector, for example i.tree-branch-head:after. This would work if Selenium accepted pseudo elements, which it does not.

To work around this problem, you can either use Chromium, that generates extra fake elements ::after and ::before, or use a Javascript extractor as described in https://stackoverflow.com/a/28265738/449288.

Can i use xpath to find element that contains ::after?

It's impossible with XPath, at least with XPath version 1.0 supported by Selenium.

With selenium you can access pseudo elements ::before and ::after with CSS Selectors only.

See here and here for more detailed explanations

xpath to extract the text in selenium

You can use following-sibling to get text node value from p tag as follow:

//*[@class="note-title"]/following-sibling::p

OR

Using css selector

.note.note-info h4 + p

Example with selenium

txt = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@class="note-title"]/following-sibling::p'))).text

OR selenium with css selector

txt = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.note.note-info h4 + p'))).text

#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to extract text between ::before and ::after

The text i is in between the ::before and ::after pseudoelements. So to extract the text you can use either of the following Locator Strategies:

  • Using css_selector:

    print(driver.find_element(By.CSS_SELECTOR, "div.kbkey.button.red").text)
  • Using xpath:

    print(driver.find_element(By.XPATH, "//div[@class='kbkey button red']").text)

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

  • Using CSS-SELECTOR:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.kbkey.button.red"))).text)
  • Using XPATH:

    print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='kbkey button red']"))).text)
  • Note : You have to add the following imports :

    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python



References

Link to useful documentation:

  • get_attribute() method gets the given attribute or property of the element.
  • text attribute returns the text of the element.
  • Difference between text and innerHTML using Selenium

Xpath extract text after using attribute selectors

  1. //*[@id="olpOfferListColumn"]/text() means to return you child text nodes. But #olpOfferListColumn element has no child text nodes, but descendant text nodes (to get all descendant text nodes you might need to use //*[@id="olpOfferListColumn"]//text())

  2. //*[@id="olpOfferListColumn"]/::text() - invalid XPath

Try

string(//*[@id="olpOfferListColumn"])

to get all text content (analogue of innerText property) of #olpOfferListColumn

Extract content with Xpath and data attribute

Since this is a single element you should not use // after //*[contains(@id, "line")]. Also it is a data-visible attribute there, not visible.

You can use XPath expression like this:

'//div[contains(@id, "line") and @data-visible="1"]'

Or

'//div[contains(@id, "line")][@data-visible="1"]'

Selenium not extracting info using xpath

//div[class='a-section a-spacing-none a-spacing-top-base']//span[class='a-size-small a-color-secondary']

XPath could be something like this. You can shorten this.

CSS selector could be and so forth.

.a-section.a-spacing-none.a-spacing-top-base
.a-size-small.a-color-secondary

scrapy xpath extract text after element is assigned

You can continue your XPath address by calling xpath on header variable:

header.xpath('./text()').get()


Related Topics



Leave a reply



Submit