Using a proxy with phantomjs (selenium webdriver)
I suspect that the proxy you are using is incorrect. I tried the following where used proxy behave sanely in windows 8.
from selenium.webdriver.common.proxy import *
from selenium import webdriver
from selenium.webdriver.common.by import By
phantomjs_path = r"E:\Software & Tutorial\Phantom\phantomjs-2.1.1-windows\bin\phantomjs.exe"
service_args = [
'--proxy=217.156.252.118:8080',
'--proxy-type=https',
]
driver = webdriver.PhantomJS(executable_path=phantomjs_path,service_args=service_args)
driver.get("https://www.google.com.bd/?gws_rd=ssl#q=what+is+my+ip")
print driver.page_source.encode('utf-8')
print "="*70
print driver.title
driver.save_screenshot(r"E:\Software & Tutorial\Phantom\test.png")
driver.quit()
See the saved image(test.png) and see the status. If used ip is blacklisted the google prompted captcha box see that image!! IP has been changed!! Setting a proxy using python, selenium, and phantomJS
Whelp... Made a small mistake.
The following line
phan_args = ['--proxy=88.157.149.250:8080', 'proxy-type=http']
should be
phan_args = ['--proxy=88.157.149.250:8080', '--proxy-type=http']
Python selenium PhantomJS proxy
The try catch for time out was something like:
try:
driver.set_page_load_timeout(1)
driver.get("http://www.example.com")
except TimeoutException as ex:
print("Exception has been thrown. " + str(ex))
For your code, adding it would be something like:from selenium import webdriver
from selenium.common.exceptions import TimeoutException
proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
def test():
temp_count_proxy = 0
driver_opened = 0
for url in weblist:
if temp_count_proxy > len(proxylist):
print("Out of proxy")
return
if driver_opened == 0:
service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
driver_opened = 1
try:
driver.set_page_load_timeout(2)
driver.get(url)
except TimeoutException as ex:
driver.close()
driver_opened = 0
temp_count_proxy += 1
continue
test()
Just becareful, as if it fail to get one url, it will change proxy, and get the next url (as you requested) but not get the same url.if you want it to change proxy when fail the retry with the current url , use following:
from selenium import webdriver
from selenium.common.exceptions import TimeoutException
proxylist=['58.12.12.12:80','69.12.12.12:80']
weblist=['https://www.google.com','https://www.facebook.com','https://www.yahoo.com','https://aol.com']
def test():
temp_count_proxy = 0
driver_opened = 0
for url in weblist:
while True:
if temp_count_proxy > len(proxylist):
print("Out of proxy")
return
if driver_opened == 0:
service_args = ['--proxy={}'.format(proxylist[temp_count_proxy]),'--proxy-type=socks5']
driver = webdriver.PhantomJS('phantomjs.exe', service_args = service_args)
driver_opened = 1
try:
driver.set_page_load_timeout(2)
driver.get(url)
# Your code to process here
except TimeoutException as ex:
driver.close()
driver_opened = 0
temp_count_proxy += 1
continue
break
Set Proxy using PhantomJS in Python
You could declare your service_args
without the proxy variable, then append it afterward:
service_args = [
'--proxy-type=http',
'--ignore-ssl-errors=true',
]
service_args.append(proxy)
Proxy would need to be a string as service_args
is a list of strings. phantomjs + selenium in python proxy-auth not working
I'm compiling answers from:
How to correctly pass basic auth (every click) using Selenium and phantomjs webdriver
as well as:
base64.b64encode error
from selenium import webdriver
from selenium.webdriver.common.desired_capabilities import DesiredCapabilities
import base64
service_args = [
'--proxy=http://fr.proxymesh.com:31280',
'--proxy-type=http',
]
authentication_token = "Basic " + base64.b64encode(b'username:password')
capa = DesiredCapabilities.PHANTOMJS
capa['phantomjs.page.customHeaders.Proxy-Authorization'] = authentication_token
driver = webdriver.PhantomJS(desired_capabilities=capa, service_args=service_args)
driver.get("http://...")
PhantomJS Proxy when using Remote Webdriver?
Since the PhantomJS instance already runs, it wouldn't make sense to pass commandline options to the RemoteDriver constructor. There is a way though.
PhantomJS itself supports a programmatic way to configure a proxy through phantom.setProxy(ip, port, type, un, pw)
(not documented, but available since PhantomJS 2). This has to be executed in the phantom context, so driver.execute_script()
won't work here.
GhostDriver accepts such script that are to be executed in the phantom context through a special command which you can invoke like this (source):
driver.command_executor._commands['executePhantomScript'] = ('POST', '/session/$sessionId/phantom/execute')
driver.execute('executePhantomScript', {'script': '''phantom.setProxy("10.0.0.1", 12345);''', 'args' : [] })
Using proxy with Selenium PhantomJS
I have figured it out, here is how it's for whoever that runs into the same question
ArrayList<String> cliArgsCap = new ArrayList<String>();
cliArgsCap.add("--proxy=127.0.0.1:1024");
// cliArgsCap.add("--proxy-auth=username:password");
cliArgsCap.add("--proxy-type=socks5");
DesiredCapabilities caps = new DesiredCapabilities();
caps.setJavascriptEnabled(true);
caps.setCapability("takesScreenshot", true);
caps.setCapability("screen-resolution", "1280x1024");
caps.setCapability(PhantomJSDriverService.PHANTOMJS_EXECUTABLE_PATH_PROPERTY, "C:\\Users\\USER\\Downloads\\phantomjs-2.0.0-windows\\bin\\phantomjs.exe");
caps.setCapability(PhantomJSDriverService.PHANTOMJS_CLI_ARGS, cliArgsCap);
Logger.getLogger(PhantomJSDriverService.class.getName()).setLevel(Level.OFF);
driver = new PhantomJSDriver(caps);
Related Topics
How to Use the Optional Type Hint
Python Argparse: Default Value or Specified Value
Difference Between Variable and Get_Variable in Tensorflow
Python Memory Usage of Numpy Arrays
Failed Loading English.Pickle with Nltk.Data.Load
How to Get All the Request Headers in Django
Broken References in Virtualenvs
Rotating a Two-Dimensional Array in Python
Convert Date to Datetime in Python
Continuing in Python's Unittest When an Assertion Fails
Selecting Columns from Pandas Multiindex
Why Is the Apt-Get Function Not Working in the Terminal on MAC Os X V10.9 (Mavericks)
How to Frame Two for Loops in List Comprehension Python
Extract Int from String in Pandas
What Is the Purpose of Meshgrid in Python/Numpy
Is There a Builtin Identity Function in Python