Extracting Content in :After Using Xpath

Extracting content in :after using XPath

You are using By.xpath but i[@class='tree-branch-head']::after is not a valid XPath, it is a mixture of XPath notation (i[@class='tree-branch-head']) and CSS (:after).

You should use By.cssSelector and a valid CSS selector, for example i.tree-branch-head:after. This would work if Selenium accepted pseudo elements, which it does not.

To work around this problem, you can either use Chromium, that generates extra fake elements ::after and ::before, or use a Javascript extractor as described in https://stackoverflow.com/a/28265738/449288.

Can i use xpath to find element that contains ::after?

It's impossible with XPath, at least with XPath version 1.0 supported by Selenium.

With selenium you can access pseudo elements ::before and ::after with CSS Selectors only.

See here and here for more detailed explanations

xpath to extract the text in selenium

You can use following-sibling to get text node value from p tag as follow:

//*[@class="note-title"]/following-sibling::p

Using css selector

.note.note-info h4 + p

Example with selenium

txt = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, '//*[@class="note-title"]/following-sibling::p'))).text

OR selenium with css selector

txt = WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, '.note.note-info h4 + p'))).text

#imports

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

How to extract text between ::before and ::after

The text i is in between the ::before and ::after pseudoelements. So to extract the text you can use either of the following Locator Strategies:

Using css_selector:

print(driver.find_element(By.CSS_SELECTOR, "div.kbkey.button.red").text)

Using xpath:

print(driver.find_element(By.XPATH, "//div[@class='kbkey button red']").text)

Ideally you need to induce WebDriverWait for the visibility_of_element_located() and you can use either of the following Locator Strategies:

Using CSS-SELECTOR:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.CSS_SELECTOR, "div.kbkey.button.red"))).text)

Using XPATH:

print(WebDriverWait(driver, 20).until(EC.visibility_of_element_located((By.XPATH, "//div[@class='kbkey button red']"))).text)

Note : You have to add the following imports :

from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

You can find a relevant discussion in How to retrieve the text of a WebElement using Selenium - Python

References

Link to useful documentation:

get_attribute() method gets the given attribute or property of the element.
text attribute returns the text of the element.
Difference between text and innerHTML using Selenium

Xpath extract text after using attribute selectors

//*[@id="olpOfferListColumn"]/text() means to return you child text nodes. But #olpOfferListColumn element has no child text nodes, but descendant text nodes (to get all descendant text nodes you might need to use //*[@id="olpOfferListColumn"]//text())
//*[@id="olpOfferListColumn"]/::text() - invalid XPath

Try

string(//*[@id="olpOfferListColumn"])

to get all text content (analogue of innerText property) of #olpOfferListColumn

Extract content with Xpath and data attribute

Since this is a single element you should not use // after //*[contains(@id, "line")]. Also it is a data-visible attribute there, not visible.

You can use XPath expression like this:

'//div[contains(@id, "line") and @data-visible="1"]'

'//div[contains(@id, "line")][@data-visible="1"]'

Selenium not extracting info using xpath

//div[class='a-section a-spacing-none a-spacing-top-base']//span[class='a-size-small a-color-secondary']

XPath could be something like this. You can shorten this.

CSS selector could be and so forth.

.a-section.a-spacing-none.a-spacing-top-base
.a-size-small.a-color-secondary

scrapy xpath extract text after element is assigned

You can continue your XPath address by calling xpath on header variable:

header.xpath('./text()').get()

Extracting Content in :After Using Xpath