How to Interact with the Recaptcha Audio Element Using Selenium and Python

Find the reCAPTCHA element and click on it -- Python + Selenium

Solution update (11-Feb-2020)

Using the following set of binaries:

  • Selenium v3.141.0
  • ChromeDriver v80.0
  • Chrome Version 80.0

You can use the following updated block of code as a solution:

from selenium import webdriver
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.get("https://www.inipec.gov.it/cerca-pec/-/pecs/companies")
WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']")))
WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[@id='recaptcha-anchor']"))).click()

Original solution

Within the URL https://www.inipec.gov.it/cerca-pec/-/pecs/companies to invoke click() on the reCAPTCHA checkbox you need to:

  • Induce WebDriverWait for the desired frame to be available and switch to it.
  • Induce WebDriverWait for the desired element to be clickable.
  • You can use the following solution:

    from selenium import webdriver
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.support import expected_conditions as EC
    from selenium.webdriver.chrome.options import Options

    options = webdriver.ChromeOptions()
    options.add_argument("start-maximized")
    options.add_argument('disable-infobars')
    driver = webdriver.Chrome(executable_path=r'C:\WebDrivers\chromedriver.exe', chrome_options=options)
    driver.get("https://www.inipec.gov.it/cerca-pec/-/pecs/companies")
    WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"iframe[name^='a-'][src^='https://www.google.com/recaptcha/api2/anchor?']")))
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//span[@class='recaptcha-checkbox goog-inline-block recaptcha-checkbox-unchecked rc-anchor-checkbox']/div[@class='recaptcha-checkbox-checkmark']"))).click()

ReCaptcha download audio file?

WebDriverWait(driver, 10).until(EC.element_to_be_clickable(
(By.CSS_SELECTOR, "span#recaptcha-anchor"))).click()
driver.switch_to.default_content()
WebDriverWait(driver, 10).until(
EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR, "iframe[title='recaptcha challenge']")))
WebDriverWait(driver, 10).until(
EC.element_to_be_clickable((By.CSS_SELECTOR, "button#recaptcha-audio-button"))).click()
WebDriverWait(driver, 10).until(
EC.presence_of_element_located((By.CSS_SELECTOR, ".rc-audiochallenge-play-button button")))

# get the mp3 audio file

src = driver.find_element_by_id("audio-source").get_attribute("src")
print(src)

Just add one more wiat for the .rc-audiochallenge-play-button button

to download you should use:

import urllib.request
urllib.request.urlretrieve(src, "src.mp3")

How can I bypass the Google CAPTCHA with Selenium and Python?

To start with using Selenium's Python clients, you should avoid solving/bypass Google CAPTCHA.



Selenium

Selenium automates browsers. Now, what you want to achieve with that power is entirely up to individuals, but primarily it is for automating web applications through browser clients for testing purposes and of coarse it is certainly not limited to that.



CAPTCHA

On the other hand, CAPTCHA (the acronym being ...Completely Automated Public Turing test to tell Computers and Humans Apart...) is a type of challenge–response test used in computing to determine if the user is human.

So, Selenium and CAPTCHA serves two completely different purposes and ideally shouldn't be used to achieve any interrelated tasks.

Having said that, reCAPTCHA can easily detect the network traffic and identify your program as a Selenium driven bot.



Generic Solution

However, there are some generic approaches to avoid getting detected while web scraping:

  • The first and foremost attribute a website can determine your script/program by is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website, keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
  • To simulate humanlike behavior, you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep Selenium WebDriver in Python for milliseconds


This use case

However, in a couple of use cases we were able to interact with the reCAPTCHA using Selenium and you can find more details in the following discussions:

  • How to click on the reCAPTCHA using Selenium and Java
  • CSS selector for reCAPTCHA checkbok using Selenium and VBA Excel
  • Find the reCAPTCHA element and click on it — Python + Selenium


References

You can find a couple of related discussion in:

  • How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?
  • Is there a version of Selenium WebDriver that is not detectable?


tl; dr

  • How does reCAPTCHA 3 know I'm using Selenium/chromedriver?

How to bypass ReCaptcha with buster extension using Selenium and Python

The Buster icon is within another sibling <iframe>. So you have to:

  • Switch back to the default_content().

  • Induce WebDriverWait for the desired frame to be available and switch to it.

  • Induce WebDriverWait for the desired element to be clickable.

  • You can use the following Locator Strategies:

  • Code Block:

    from selenium import webdriver
    from selenium.webdriver.support.ui import WebDriverWait
    from selenium.webdriver.common.by import By
    from selenium.webdriver.support import expected_conditions as EC

    driver.switch_to.default_content()
    WebDriverWait(driver, 10).until(EC.frame_to_be_available_and_switch_to_it((By.CSS_SELECTOR,"//iframe[@title='recaptcha challenge']")))
    WebDriverWait(driver, 10).until(EC.element_to_be_clickable((By.XPATH, "//button[@id='solver-button']"))).click()
  • Browser Snapshot:

recaptcha_iframe



Reference

You can find a couple of relevant discussions in:

  • How to interact with the reCAPTCHA audio element using Selenium and Python
  • How to send text to the Password field within https://mail.protonmail.com registration page?


Outro

Ways to deal with #document under iframe



Related Topics



Leave a reply



Submit