Login to Website Using Urllib2 - Python 2.7

Login to website using urllib2 - Python 2.7

I'll preface this by saying I haven't done logging in in this way for a while, so I could be missing some of the more 'accepted' ways to do it.

I'm not sure if this is what you're after, but without a library like mechanize or a more robust framework like selenium, in the basic case you just look at the form itself and seek out the inputs. For instance, looking at www.reddit.com, and then viewing the source of the rendered page, you will find this form:

<form method="post" action="https://ssl.reddit.com/post/login" id="login_login-main"
class="login-form login-form-side">
<input type="hidden" name="op" value="login-main" />
<input name="user" placeholder="username" type="text" maxlength="20" tabindex="1" />
<input name="passwd" placeholder="password" type="password" tabindex="1" />

<div class="status"></div>

<div id="remember-me">
<input type="checkbox" name="rem" id="rem-login-main" tabindex="1" />
<label for="rem-login-main">remember me</label>
<a class="recover-password" href="/password">reset password</a>
</div>

<div class="submit">
<button class="btn" type="submit" tabindex="1">login</button>
</div>

<div class="clear"></div>
</form>

Here we see a few input's - op, user, passwd and rem. Also, notice the action parameter - that is the URL to which the form will be posted, and will therefore be our target. So now the last step is packing the parameters into a payload and sending it as a POST request to the action URL. Also below, we create a new opener, add the ability to handle cookies and add headers as well, giving us a slightly more robust opener to execute the requests):

import cookielib
import urllib
import urllib2

# Store the cookies and create an opener that will hold them
cj = cookielib.CookieJar()
opener = urllib2.build_opener(urllib2.HTTPCookieProcessor(cj))

# Add our headers
opener.addheaders = [('User-agent', 'RedditTesting')]

# Install our opener (note that this changes the global opener to the one
# we just made, but you can also just call opener.open() if you want)
urllib2.install_opener(opener)

# The action/ target from the form
authentication_url = 'https://ssl.reddit.com/post/login'

# Input parameters we are going to send
payload = {
'op': 'login-main',
'user': '<username>',
'passwd': '<password>'
}

# Use urllib to encode the payload
data = urllib.urlencode(payload)

# Build our Request object (supplying 'data' makes it a POST)
req = urllib2.Request(authentication_url, data)

# Make the request and read the response
resp = urllib2.urlopen(req)
contents = resp.read()

Note that this can get much more complicated - you can also do this with GMail, for instance, but you need to pull in parameters that will change every time (such as the GALX parameter). Again, not sure if this is what you wanted, but hope it helps.

Login to a website using python and urllib2

I would use requests to post the form data, the following shows you how:

# post url for logon
url = "http://mediaweather.meteogroup.co.uk/login/process"

data = {"txtusername": "username",
"txtpassword": "pass"}

with requests.Session() as s:
# login
s.post(url, data=data)
# get page you want
html = s.get('http://mediaweather.meteogroup.co.uk/observations/table/6040?country_id=6040&d=01&m=07&y=2016&h=11&i=00&sort=landstat_name_wmo&order=asc).content

How do I use Python to Login to a website using urllib2 and then open a browser page to the logged in site?

You can use either requests and urllib2 to login or you can use mechanize to simulate the browser, which is a better way I think.
Here is link you can refer to:
requests or mechanize

Python: Log in a website using urllib

You're forgetting the hidden fields of the form:

<form id="loginForm" class="validate-enabled failure form" method="post" action="https://www.fitbit.com/login" name="login">
<input type="hidden" value="Log In" name="login">
<input type="hidden" value="" name="includeWorkflow">
<input id="loginRedirect" type="hidden" value="" name="redirect">
<input id="disableThirdPartyLogin" type="hidden" value="false" name="disableThirdPartyLogin">
<input class="field email" type="text" tabindex="23" name="email" placeholder="E-mail">
<input class="field password" type="password" tabindex="24" name="password" placeholder="Mot de passe">
</form>

so you may want to update:

acc_pwd = {'login':'Log In',
'email':'username',
'password':'pwd',
'disableThirdPartyLogin':'false',
'loginRedirect':'',
'includeWorkflow':'',
'login':'Log In'
}

which might get checked by their service. Though, given the name of the field disableThirdPartyLogin, I'm wondering if there's no dirty javascript bound to the form's submit action that actually adds a value before actually doing the POST. You might want to check that with developer tools and POST values analyzed.

Testing that looks it does not, though the javascript adds some values, which may be from cookies:

__fp    w686jv_O1ZZztQ7FkK21Ry2MI7JbqWTf
_sourcePage tJvTQfA5dkvGrJMFkFsv6XbX0f6OV1Ndj1zeGcz7OKzA3gkNXMXGnj27D-H9WXS-
disableThirdPartyLogin false
email foo@example.org
includeWorkflow
login Log In
password aeou
redirect

here's my take on doing this using requests (which has a better API than urllib ;-) )

>>> import requests
>>> import cookielib
>>> jar = cookielib.CookieJar()
>>> login_url = 'https://www.fitbit.com/login'
>>> acc_pwd = {'login':'Log In',
... 'email':'username',
... 'password':'pwd',
... 'disableThirdPartyLogin':'false',
... 'loginRedirect':'',
... 'includeWorkflow':'',
... 'login':'Log In'
... }
>>> r = requests.get(login_url, cookies=jar)
>>> r = requests.post(login_url, cookies=jar, data=acc_pwd)

and don't forget to first get on the login page using a get to fill your cookies jar in!

Finally, I can't help you further, as I don't have a valid account on fitbit.com and I don't need/want one. So I can only get to the login failure page for my tests.

edit:

to parse the output, then you can use:

>>> from lxml import etree
>>> p = etree.HTML(r.text)

for example to get the error messages:

>>> p.xpath('//ul[@class="errorList"]/li/text()')
['Lutilisateur nexiste pas ou le mot de passe est incorrect.']

resources:

  • lxml: http://lxml.de
  • requests: http://python-requests.org

and they both on pypi:

pip install lxml requests

HTH

Login on a site using urllib

It's not using urllib directly, but you may find it easier working with the requests package. requests has a session object see this answer

import requests

url = 'http://cheese.formice.com/forum/login/login'
login_data = dict(login='Cfmaccount', password='tfmdev321')
session = requests.session()

r = session.post(url, data=login_data)

That will log you in to the site. You can verify with:

print r.text #prints the <html> response.

Once logged in, you can call the specific url you want.

r2 = session.get('http://cheese.formice.com/maps/@5865339')
print r2.content #prints the raw html you can now parse and scrape

Website form login using Python urllib2

If you would inspect what request is sent to the server when you enter the number and submit the form, you would notice that it is a POST request with pnr and _token parameters:

Sample Image

You are missing the _token parameter which you need to extract from the HTML source of the page. It is a hidden input element:

<input name="_token" type="hidden" value="WRbJ5x05vvDlzMgzQydFxkUfcFSjSLDhknMHtU6m">

I suggest looking into tools like Mechanize, MechanicalSoup or RoboBrowser that would ease the form submission. You may also parse the HTML with an HTML parser, like BeautifulSoup yourself, extract the token and send via urllib2 or requests:

import requests
from bs4 import BeautifulSoup

PNR = "00000000"

url = "http://reg.maths.lth.se/"
login_url = "http://reg.maths.lth.se/login/student"
with requests.Session() as session:
# extract token
response = session.get(url)
soup = BeautifulSoup(response.content, "html.parser")
token = soup.find("input", {"name": "_token"})["value"]

# submit form
session.post(login_url, data={
"_token": token,
"pnr": PNR
})

# navigate to the main page again (should be logged in)
response = session.get(url)

soup = BeautifulSoup(response.content, "html.parser")
print(soup.title)

Using urllib/urllib2 get a session cookie and use it to login to a final page

The reason why the 'final page' was rejecting the cookies is because Python was adding 'User-agent', 'Python-urllib/2.7'to the header. After removing this element I was able to login to a website:

opener.addheaders.pop(0)


Related Topics



Leave a reply



Submit