How to update the Proxy Server within the same session using Selenium and Python
No, you won't be able to change the proxy server using Selenium after starting the driver and the Browsing Context.
When you configure an instance of a ChromeDriver with ChromeOptions()
to span a new Chrome Browsing Context the configuration gets baked within the chromedriver executable which will persist for the lifetime of the WebDriver and being uneditable. So you can't modify/add any existing/new configuration through ChromeOptions()
class to the WebDriver instance which is currently in execution.
Even if you are able to extract the ChromeDriver and ChromeSession attributes e.g. Session ID, Cookies, UserAgent and other session attributes from the already initiated ChromeDriver and Chrome Browsing Session still you won't be able to change the set of attributes of the ChromeDriver.
A cleaner way would be to quit()
the existing ChromeDriver and Chrome Browser instances gracefully and then span a new set of ChromeDriver and Chrome Browser instance with the new set of proxy configuration as follows:
options = webdriver.ChromeOptions()
options.add_argument('--no-sandbox')
options.add_argument('--disable-setuid-sandbox')
options.add_argument('--disable-dev-shm-usage')
options.add_argument('--window-size=600,400')
options.add_argument('--ignore-certificate-errors')
options.add_argument('--disable-accelerated-2d-canvas')
options.add_argument('--disable-gpu')
options.add_argument('--headless')
urls_to_visit = ['https://www.google.com/', 'https://stackoverflow.com/']
proxies = open("proxy.txt", "r", encoding="utf-8", errors="ignore").readlines()
for i in range(0, len(urls_to_visit)):
proxy = ((random.choice(proxies)).replace("\n", ""))
options.add_argument('--proxy-server=%s' % proxy)
browser = webdriver.Chrome(Path, options=options)
browser.get("{}".format(urls_to_visit[i]))
# perform the tasks
driver.quit()
References
You can find a couple of relevant discussions in:
- How to rotate Selenium webrowser IP address
Selenium with proxy not working / wrong options?
There are a couple of things you need to look back:
First of all, there seems be a typo. There is a space character between
get
and()
which may cause:IndexError: list index out of range
Not sure what the following line does as I'm able to execute without the line. You may like to comment it.
from dotenv import load_dotenv, find_dotenv
load_dotenv(find_dotenv())If you want to stop using SCRAPER_API do comment out the following line as well:
SCRAPER_API = os.environ.get("SCRAPER_API")
Making those minor tweaks and optimizing your code:
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
WAIT = 10
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
tmpSPAN = e.find_all("span")
for e2 in tmpSPAN:
print(e2.text)
print(tmpIP.text)
driver.quit()
Console Output:
[WDM] -
[WDM] - ====== WebDriver manager ======
[WDM] - Current google-chrome version is 96.0.4664
[WDM] - Get LATEST driver version for 96.0.4664
[WDM] - Driver [C:\Users\Admin\.wdm\drivers\chromedriver\win32\96.0.4664.45\chromedriver.exe] found in cache
ISP:
Jio
City:
Pune
Region:
Maharashtra
Country:
India
123.12.234.23
Saved Screenshot:
Using the proxy
import time
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
from selenium.webdriver.chrome.service import Service
from selenium.webdriver.support.ui import WebDriverWait
from webdriver_manager.chrome import ChromeDriverManager
from fake_useragent import UserAgent
from bs4 import BeautifulSoup
WAIT = 10
load_dotenv(find_dotenv())
SCRAPER_API = os.environ.get("SCRAPER_API")
PROXY = f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001'
srv=Service(ChromeDriverManager().install())
ua = UserAgent()
userAgent = ua.random
options = Options()
options.add_argument('--headless')
options.add_experimental_option ('excludeSwitches', ['enable-logging'])
options.add_argument("start-maximized")
options.add_argument('window-size=1920x1080')
options.add_argument('--no-sandbox')
options.add_argument('--disable-gpu')
options.add_argument(f'user-agent={userAgent}')
options.add_argument('--proxy-server={}'.format(PROXY))
path = os.path.abspath (os.path.dirname (sys.argv[0]))
if platform == "win32": cd = '/chromedriver.exe'
elif platform == "linux": cd = '/chromedriver'
elif platform == "darwin": cd = '/chromedriver'
driver = webdriver.Chrome (service=srv, options=options)
waitWebDriver = WebDriverWait (driver, 10)
link = "https://whatismyipaddress.com/"
driver.get(link)
driver.save_screenshot("whatismyipaddress.png")
time.sleep(WAIT)
soup = BeautifulSoup (driver.page_source, 'html.parser')
tmpIP = soup.find("span", {"id": "ipv4"})
tmpP = soup.find_all("p", {"class": "information"})
for e in tmpP:
tmpSPAN = e.find_all("span")
for e2 in tmpSPAN:
print(e2.text)
print(tmpIP.text)
driver.quit()
Note:
print(f'http://scraperapi:{SCRAPER_API}@proxy-server.scraperapi.com:8001')
and ensure the SCRAPER_API returns a result.
References
You can find a couple of relevant detailed discussions in:
- How to rotate Selenium webrowser IP address
- using http proxy with selenium Geckodriver
Rotating IP with selenium and Tor
So to achieve this, I use an other proxy, selenium-wire is very good but it need to be fix.
I have use Browsermob proxy and set an upstream proxy to work with.
The result is you can catch every HTTP resquest or response parse it and the ip rotate every time and use tor HTTPTunnelPort configuration.
proxy_params = {'httpProxy': 'localhost:8088', 'httpsProxy': 'localhost:8088'}
proxy_b = server.create_proxy(params=proxy_params)
Thanks
Errors in ChromeDriver logs using a proxy through Selenium and Python
To initiate Chrome browser using a proxy you can try the following solution:
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
PROXY = "164.68.123.119:9300"
options = Options()
options.add_argument('--proxy-server={}'.format(PROXY))
options.add_argument("start-maximized")
driver = webdriver.Chrome(executable_path=r'C:\WebDriver\ChromeDriver\chromedriver.exe', options=options)
print(type(driver))
driver.get("https://www.google.com")
myPageTitle=driver.title
print(myPageTitle)
assert "Google" in myPageTitle
time.sleep(10)
driver.quit()
PS: chrome_browser_main_extra_parts_metrics.cc
, device_event_log_impl.cc
, usb_descriptors.cc
, etc are the result of a generic bug due to Chrome/ChromeDriver compatibility which you can ignore as of now. For details check Parametrized tests are flaky due to a timeout in recording expensive metrics on startup.
Reference
You can find a relevant discussions in:
- How to rotate Selenium webrowser IP address
Related Topics
Deleting Multiple Columns Based on Column Names in Pandas
Read from File After Write, Before Closing
How to Escape Latex Code Received Through User Input
Why Does Python's _Import_ Require Fromlist
How to Rotate Selenium Webrowser Ip Address
Parsing .Properties File in Python
Reading/Writing Ms Word Files in Python
Operationalerror: Database Is Locked
Multiple Ping Script in Python
Can One Get Hierarchical Graphs from Networkx with Python 3
Installing Multiple Versions of a Package with Pip
Get the Position of the Largest Value in a Multi-Dimensional Numpy Array