Login to website using urllib2 - Python 2.7
I'll preface this by saying I haven't done logging in in this way for a while, so I could be missing some of the more 'accepted' ways to do it.
I'm not sure if this is what you're after, but without a library like mechanize
or a more robust framework like selenium
, in the basic case you just look at the form itself and seek out the inputs
. For instance, looking at www.reddit.com
, and then viewing the source of the rendered page, you will find this form:
<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"
class="login-form login-form-side">
<input type="hidden" name="op" value="login-main" />
<input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" />
<input name="passwd" placeholder="password" type="password" tabindex="1" />
<div class="status"></div>
<div id="remember-me">
<input type="checkbox" name="rem" id="rem-login-main" tabindex="1" />
<label for="rem-login-main">remember me</label>
<a class="recover-password" href="/password">reset password</a>
</div>
<div class="submit">
<button class="btn" type="submit" tabindex="1">login</button>
</div>
<div class="clear"></div>
</form>
Here we see a few input
's - op
, user
, passwd
and rem
. Also, notice the action
parameter - that is the URL to which the form will be posted, and will therefore be our target. So now the last step is packing the parameters into a payload and sending it as a POST
request to the action
URL. Also below, we create a new opener
, add the ability to handle cookies and add headers as well, giving us a slightly more robust opener to execute the requests):
import cookielib
import urllib
import urllib2
# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))
# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]
# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)
# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'
# Input parameters we are going to send
payload = {
'op': 'login-main',
'user': '<username>',
'passwd': '<password>'
}
# Use urllib to encode the payload
data = urllib.urlencode(payload)
# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)
# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()
Note that this can get much more complicated - you can also do this with GMail, for instance, but you need to pull in parameters that will change every time (such as the GALX
parameter). Again, not sure if this is what you wanted, but hope it helps.
Login to a website using python and urllib2
I would use requests to post the form data, the following shows you how:
# post url for logon
url = "http://mediaweather.meteogroup.co.uk/login/process"
data = {"txtusername": "username",
"txtpassword": "pass"}
with requests.Session() as s:
# login
s.post(url, data=data)
# get page you want
html = s.get('http://mediaweather.meteogroup.co.uk/observations/table/6040?country_id=6040&d=01&m=07&y=2016&h=11&i=00&sort=landstat_name_wmo&order=asc).content
How do I use Python to Login to a website using urllib2 and then open a browser page to the logged in site?
You can use either requests and urllib2 to login or you can use mechanize to simulate the browser, which is a better way I think.
Here is link you can refer to:
requests or mechanize
Python: Log in a website using urllib
You're forgetting the hidden fields of the form:
<form id="loginForm" class="validate-enabled failure form" method="post" action="https://www.fitbit.com/login" name="login">
<input type="hidden" value="Log In" name="login">
<input type="hidden" value="" name="includeWorkflow">
<input id="loginRedirect" type="hidden" value="" name="redirect">
<input id="disableThirdPartyLogin" type="hidden" value="false" name="disableThirdPartyLogin">
<input class="field email" type="text" tabindex="23" name="email" placeholder="E-mail">
<input class="field password" type="password" tabindex="24" name="password" placeholder="Mot de passe">
</form>
so you may want to update:
acc_pwd = {'login':'Log In',
'email':'username',
'password':'pwd',
'disableThirdPartyLogin':'false',
'loginRedirect':'',
'includeWorkflow':'',
'login':'Log In'
}
which might get checked by their service. Though, given the name of the field disableThirdPartyLogin
, I'm wondering if there's no dirty javascript bound to the form's submit action that actually adds a value before actually doing the POST. You might want to check that with developer tools and POST values analyzed.
Testing that looks it does not, though the javascript adds some values, which may be from cookies:
__fp w686jv_O1ZZztQ7FkK21Ry2MI7JbqWTf
_sourcePage tJvTQfA5dkvGrJMFkFsv6XbX0f6OV1Ndj1zeGcz7OKzA3gkNXMXGnj27D-H9WXS-
disableThirdPartyLogin false
email foo@example.org
includeWorkflow
login Log In
password aeou
redirect
here's my take on doing this using requests (which has a better API than urllib ;-) )
>>> import requests
>>> import cookielib
>>> jar = cookielib.CookieJar()
>>> login_url = 'https://www.fitbit.com/login'
>>> acc_pwd = {'login':'Log In',
... 'email':'username',
... 'password':'pwd',
... 'disableThirdPartyLogin':'false',
... 'loginRedirect':'',
... 'includeWorkflow':'',
... 'login':'Log In'
... }
>>> r = requests.get(login_url, cookies=jar)
>>> r = requests.post(login_url, cookies=jar, data=acc_pwd)
and don't forget to first get on the login page using a get to fill your cookies jar in!
Finally, I can't help you further, as I don't have a valid account on fitbit.com and I don't need/want one. So I can only get to the login failure page for my tests.
edit:
to parse the output, then you can use:
>>> from lxml import etree
>>> p = etree.HTML(r.text)
for example to get the error messages:
>>> p.xpath('//ul[@class="errorList"]/li/text()')
['Lutilisateur nexiste pas ou le mot de passe est incorrect.']
resources:
- lxml: http://lxml.de
- requests: http://python-requests.org
and they both on pypi:
pip install lxml requests
HTH
Login on a site using urllib
It's not using urllib
directly, but you may find it easier working with the requests
package. requests
has a session
object see this answer
import requests
url = 'http://cheese.formice.com/forum/login/login'
login_data = dict(login='Cfmaccount', password='tfmdev321')
session = requests.session()
r = session.post(url, data=login_data)
That will log you in to the site. You can verify with:
print r.text #prints the <html> response.
Once logged in, you can call the specific url you want.
r2 = session.get('http://cheese.formice.com/maps/@5865339')
print r2.content #prints the raw html you can now parse and scrape
Website form login using Python urllib2
If you would inspect what request is sent to the server when you enter the number and submit the form, you would notice that it is a POST request with pnr
and _token
parameters:
You are missing the _token
parameter which you need to extract from the HTML source of the page. It is a hidden input
element:
<input name="_token" type="hidden" value="WRbJ5x05vvDlzMgzQydFxkUfcFSjSLDhknMHtU6m">
I suggest looking into tools like Mechanize
, MechanicalSoup
or RoboBrowser
that would ease the form submission. You may also parse the HTML with an HTML parser, like BeautifulSoup
yourself, extract the token and send via urllib2
or requests
:
import requests
from bs4 import BeautifulSoup
PNR = "00000000"
url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]
# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})
# navigate to the main page again (should be logged in)
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
print(soup.title)
Using urllib/urllib2 get a session cookie and use it to login to a final page
The reason why the 'final page' was rejecting the cookies is because Python was adding 'User-agent', 'Python-urllib/2.7'
to the header. After removing this element I was able to login to a website:
opener.addheaders.pop(0)
Related Topics
How to Do Row-To-Column Transposition of Data in CSV Table
Generalise Slicing Operation in a Numpy Array
Better Way to Shuffle Two Numpy Arrays in Unison
How to Read a Column of CSV as Dtype List Using Pandas
Nltk-Based Text Processing with Pandas
How to Overlay Plots from Different Cells
Sort Multidimensional Array Based on 2Nd Element of the Subarray
Django Serializer Imagefield to Get Full Url
How to Efficiently Handle European Decimal Separators Using the Pandas Read_CSV Function
Python Regular Expression Pattern * Is Not Working as Expected
How Does My Input Not Equal the Answer
How to Install Xgboost Package in Python (Windows Platform)
Format Strings VS Concatenation
Attributeerror: 'Client' Object Has No Attribute 'Send_Message' (Discord Bot)
How to Make Lists Contain Only Distinct Element in Python