How to Use Selenium Webdriver Without Informing the Document That It Is Controlled by Webdriver

Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?

No, there is no way to conceal that you are runing automated test.

WebDriver Interface

When using the WebDriver interface the webdriver-active flag is set to true as the user agent is under remote control. It is initially false.

WebIDL

Navigator includes NavigatorAutomationInformation;

Note that the NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

WebIDL

interface mixin NavigatorAutomationInformation {
readonly attribute boolean webdriver;
};

webdriver

  • Returns true if webdriver-active flag is set, false otherwise.

Example

  • For web authors :

    navigator.webdriver

    Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.

The above mentioned implementation is based on a couple of Security Considerations as follows:

  • A user agent can rely on a command-line flag or a configuration option to test whether to enable WebDriver, or alternatively make the user agent initiate or confirm the connection through a privileged content document or control widget, in case the user agent does not directly implement the HTTP endpoints.

  • It is strongly suggested that user agents require users to take explicit action to enable WebDriver, and that WebDriver remains disabled in publicly consumed versions of the user agent.

  • It is also suggested that user agents make an effort to visually distinguish a user agent session that is under control of WebDriver from those used for normal browsing sessions. This can be done through a browser chrome element such as a door hanger, colorful decoration of the OS window, or some widget element that is prevalent in the window so that it easy to identify automation windows.


Reference

You can find a couple of detailed discussion in:

  • Distil detects WebDriver driven Chrome Browsing Context
  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Akamai Bot Manager detects WebDriver driven Chrome Browsing Context

Unable to use Selenium to automate Chase site login

I took your code and simplified the structure and ran the test with minimal lines of code as follows:

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.support.ui import WebDriverWait

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_argument("--disable-extensions")
driver = webdriver.Chrome(chrome_options=options, executable_path=r'C:\Utility\BrowserDrivers\chromedriver.exe')
driver.get("https://secure07c.chase.com/web/auth/#/logon/logon/chaseOnline?")
WebDriverWait(driver, 20).until(EC.element_to_be_clickable((By.CSS_SELECTOR, "input.jpui.input.logon-xs-toggle.clientSideError"))).send_keys("jsmiao")
driver.find_element_by_css_selector("input.jpui.input.logon-xs-toggle#password-input-field").send_keys("hello")
driver.find_element_by_css_selector("button#signin-button>span.label").click()

Similarly, as per your observation I have hit the same roadblock with the error as:

Chase Login Failure Image

It seems the click() on the element with text as Sign in does happens. Though the username / password lookup is initiated but the process is interupted. While inspecting the DOM Tree of the webpage you will find that some of the <script> tag refers to JavaScripts having keyword dist. As an example:

  • <script src="https://static.chasecdn.com/web/library/blue-boot/dist/2.20.3/blue-boot/js/main-ver.js"></script>
  • <script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="blue-vendor/main" src="https://static.chasecdn.com/web/library/blue-vendor/dist/2.11.1/blue-vendor/js/main.js"></script>
  • <script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="blue/main" src="https://static.chasecdn.com/web/library/blue-core/dist/2.16.3/blue/js/main.js"></script>
  • <script type="text/javascript" charset="utf-8" async="" data-requirecontext="_" data-requiremodule="blue-app/main" src="https://static.chasecdn.com/web/library/blue-app/dist/2.15.1/blue-app/js/main.js"></script>

Which is a clear indication that the website is protected by Bot Management service provider Distil Networks and the navigation by ChromeDriver gets detected and subsequently blocked.


Distil

As per the article There Really Is Something About Distil.it...:

Distil protects sites against automatic content scraping bots by observing site behavior and identifying patterns peculiar to scrapers. When Distil identifies a malicious bot on one site, it creates a blacklisted behavioral profile that is deployed to all its customers. Something like a bot firewall, Distil detects patterns and reacts.

Further,

"One pattern with Selenium was automating the theft of Web content", Distil CEO Rami Essaid said in an interview last week. "Even though they can create new bots, we figured out a way to identify Selenium the a tool they're using, so we're blocking Selenium no matter how many times they iterate on that bot. We're doing that now with Python and a lot of different technologies. Once we see a pattern emerge from one type of bot, then we work to reverse engineer the technology they use and identify it as malicious".


Reference

You can find a couple of detailed discussion in:

  • Is there a way to use Selenium WebDriver without informing the document that it is controlled by WebDriver?
  • Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection
  • Akamai Bot Manager detects WebDriver driven Chrome Browsing Context
  • Is there a version of selenium webdriver that is not detectable?

Selenium webdriver: Modifying navigator.webdriver flag to prevent selenium detection

First the update 1

execute_cdp_cmd(): With the availability of execute_cdp_cmd(cmd, cmd_args) command now you can easily execute google-chrome-devtools commands using Selenium. Using this feature you can modify the navigator.webdriver easily to prevent Selenium from getting detected.



Preventing Detection 2

To prevent Selenium driven WebDriver getting detected a niche approach would include either / all of the below mentioned steps:

  • Adding the argument --disable-blink-features=AutomationControlled

    from selenium import webdriver

    options = webdriver.ChromeOptions()
    options.add_argument('--disable-blink-features=AutomationControlled')
    driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
    driver.get("https://www.website.com")

You can find a relevant detailed discussion in Selenium can't open a second page

  • Rotating the user-agent through execute_cdp_cmd() command as follows:

    #Setting up Chrome/83.0.4103.53 as useragent
    driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
  • Change the property value of the navigator for webdriver to undefined

    driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
  • Exclude the collection of enable-automation switches

    options.add_experimental_option("excludeSwitches", ["enable-automation"])
  • Turn-off useAutomationExtension

    options.add_experimental_option('useAutomationExtension', False)


Sample Code 3

Clubbing up all the steps mentioned above and effective code block will be:

from selenium import webdriver

options = webdriver.ChromeOptions()
options.add_argument("start-maximized")
options.add_experimental_option("excludeSwitches", ["enable-automation"])
options.add_experimental_option('useAutomationExtension', False)
driver = webdriver.Chrome(options=options, executable_path=r'C:\WebDrivers\chromedriver.exe')
driver.execute_script("Object.defineProperty(navigator, 'webdriver', {get: () => undefined})")
driver.execute_cdp_cmd('Network.setUserAgentOverride', {"userAgent": 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.53 Safari/537.36'})
print(driver.execute_script("return navigator.userAgent;"))
driver.get('https://www.httpbin.org/headers')


History

As per the W3C Editor's Draft the current implementation strictly mentions:

The webdriver-active flag is set to true when the user agent is under remote control which is initially set to false.

Further,

Navigator includes NavigatorAutomationInformation;

It is to be noted that:

The NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

The NavigatorAutomationInformation interface is defined as:

interface mixin NavigatorAutomationInformation {
readonly attribute boolean webdriver;
};

which returns true if webdriver-active flag is set, false otherwise.

Finally, the navigator.webdriver defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, so that alternate code paths can be triggered during automation.

Caution: Altering/tweaking the above mentioned parameters may block the navigation and get the WebDriver instance detected.



Update (6-Nov-2019)

As of the current implementation an ideal way to access a web page without getting detected would be to use the ChromeOptions() class to add a couple of arguments to:

  • Exclude the collection of enable-automation switches
  • Turn-off useAutomationExtension

through an instance of ChromeOptions as follows:

  • Java Example:

    System.setProperty("webdriver.chrome.driver", "C:\\Utility\\BrowserDrivers\\chromedriver.exe");
    ChromeOptions options = new ChromeOptions();
    options.setExperimentalOption("excludeSwitches", Collections.singletonList("enable-automation"));
    options.setExperimentalOption("useAutomationExtension", false);
    WebDriver driver = new ChromeDriver(options);
    driver.get("https://www.google.com/");
  • Python Example

    from selenium import webdriver

    options = webdriver.ChromeOptions()
    options.add_experimental_option("excludeSwitches", ["enable-automation"])
    options.add_experimental_option('useAutomationExtension', False)
    driver = webdriver.Chrome(options=options, executable_path=r'C:\path\to\chromedriver.exe')
    driver.get("https://www.google.com/")
  • Ruby Example

      options = Selenium::WebDriver::Chrome::Options.new
    options.add_argument("--disable-blink-features=AutomationControlled")
    driver = Selenium::WebDriver.for :chrome, options: options


Legends

1: Applies to Selenium's Python clients only.

2: Applies to Selenium's Python clients only.

3: Applies to Selenium's Python clients only.

How to Conceal WebDriver in Geckodriver from BotD in Java?

When using Selenium driven GeckoDriver initiated firefox Browsing Context

The webdriver-active flag is set to true when the user agent is under remote control. It is initially false.

webdriver-active flag

where, webdriver returns true if webdriver-active flag is set, false otherwise.

As:

navigator.webdriver Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for
example so that alternate code paths can be triggered during
automation.

Further @whimboo in his comments confirmed:

This implementation have to be conformant to this requirement. As such
we will not provide a way to circumvent that.



Conclusion

So, the bottom line is:

Selenium identifies itself

and there is no way to conceal the fact that the browser is WebDriver driven.



Recommendations

However some pundits have suggested some different approaches which can conceal the fact that the Mozilla Firefox browser is WebDriver controled through the usage of Firefox Profiles and Proxies as follows:

selenium4 compatible python code

from selenium.webdriver import Firefox
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options

profile_path = r'C:\Users\Admin\AppData\Roaming\Mozilla\Firefox\Profiles\s8543x41.default-release'
options=Options()
options.set_preference('profile', profile_path)
options.set_preference('network.proxy.type', 1)
options.set_preference('network.proxy.socks', '127.0.0.1')
options.set_preference('network.proxy.socks_port', 9050)
options.set_preference('network.proxy.socks_remote_dns', False)
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = Firefox(service=service, options=options)
driver.get("https://www.google.com")
driver.quit()


Potential Solution

A potential solution would be to use the tor browser as follows:

selenium4 compatible python code

from selenium.webdriver import Firefox  
from selenium import webdriver
from selenium.webdriver.firefox.service import Service
from selenium.webdriver.firefox.options import Options
import os

torexe = os.popen(r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Tor\tor.exe')
profile_path = r'C:\Users\username\Desktop\Tor Browser\Browser\TorBrowser\Data\Browser\profile.default'
firefox_options=Options()
firefox_options.set_preference('profile', profile_path)
firefox_options.set_preference('network.proxy.type', 1)
firefox_options.set_preference('network.proxy.socks', '127.0.0.1')
firefox_options.set_preference('network.proxy.socks_port', 9050)
firefox_options.set_preference("network.proxy.socks_remote_dns", False)
firefox_options.binary_location = r'C:\Users\username\Desktop\Tor Browser\Browser\firefox.exe'
service = Service('C:\\BrowserDrivers\\geckodriver.exe')
driver = webdriver.Firefox(service=service, options=firefox_options)
driver.get("https://www.tiktok.com/")


References

You can find a couple of relevant detailed discussions in

  • How to initiate a Tor Browser 9.5 which uses the default Firefox to 68.9.0esr using GeckoDriver and Selenium through Python
  • How to connect to Tor browser using Python
  • How to use Tor with Chrome browser through Selenium

How can I make a Selenium script undetectable using GeckoDriver and Firefox through Python?

The fact that selenium driven Firefox / GeckoDriver gets detected doesn't depends on any specific GeckoDriver or Firefox version. The Websites themselves can detect the network traffic and can identify the Browser Client i.e. Web Browser as WebDriver controled.

As per the documentation of the WebDriver Interface in the latest editor's draft of WebDriver - W3C Living Document the webdriver-active flag which is initially set as false, is set to true when the user agent is under remote control i.e. when controlled through Selenium.

NavigatorAutomationInformation

Now that the NavigatorAutomationInformation interface should not be exposed on WorkerNavigator.

mixin NavigatorAutomationInformation

So,

webdriver
Returns true if webdriver-active flag is set, false otherwise.

where as,

navigator.webdriver
Defines a standard way for co-operating user agents to inform the document that it is controlled by WebDriver, for example so that alternate code paths can be triggered during automation.

So, the bottom line is:

Selenium identifies itself


However some generic approaches to avoid getting detected while web-scraping are as follows:

  • The first and foremost attribute a website can determine your script/program is through your monitor size. So it is recommended not to use the conventional Viewport.
  • If you need to send multiple requests to a website, you need to keep on changing the User Agent on each request. Here you can find a detailed discussion on Way to change Google Chrome user agent in Selenium?
  • To simulate human like behavior you may require to slow down the script execution even beyond WebDriverWait and expected_conditions inducing time.sleep(secs). Here you can find a detailed discussion on How to sleep webdriver in python for milliseconds


Related Topics



Leave a reply



Submit