In Mechanize (Ruby), how to login then scrape?
The agent
variable retains the session and cookies.
So you first do your login, as you did, and then you write agent.get(---your-pdf-link-here--)
.
In your example code is a small error: the result of the submit
is in search_results
and then you continue to use page
to search for the links?
So in your case, I guess it should look like (untested of course) :
# step 1, login:
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::FileSaver
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
page = form.submit
# step 2, get the PDF:
page.parser.xpath('//th/a').each do |link|
agent.get link['href']
end
How to fill out login form with mechanize in Ruby?
The following code should work:
page = agent.get("your_page_url")
form = page.form_with(:id => 'form-login-page')
form.login = "my_login"
form.password = "my_password"
form.submit
Ruby Mechanize Login form submit error
Here's what the browser sends:
- currentTime: MTQ5MTg3OTYzNDAwMA==
- userId: foo
- password: 37b51d194a7513e45b56f6524f2d51f2
- password1: bar
The password looks like a md5 and the currentTime is a base64 of a timestamp (1491879634000 in this case).
Using Ruby with Mechanize to log into a website
This is the approach I usually take. It hasn't failed me:
username_field = form.field_with(:name => "user_session[username]")
username_field.value = "whatever_user"
password_field = form.field_with(:name => "user_session[password]")
password_field.value = "whatever_pwd"
form.submit
Google login with mechanize on ruby
I've been digging around a bit more and found out that it's not that simple and I could not login with mechanize in any way.
So I ended up with using watir which was fairly simple and straightforward. Here's an example:
browser.goto LOGIN_URL
browser.text_field(:id, 'Email').set @config.email
browser.button(:id, 'next').click
browser.text_field(:id, 'Passwd').wait_until_present
browser.text_field(:id, 'Passwd').set @config.password
browser.button(:id, 'signIn').click
# Here I wait until an element on my target page is visible and then continue
browser.link(:href, '#SOMETHING').wait_until_present
Hope it helps.
Targeting a form using Mechanize and Ruby
Inspecting the elements, we find the following:
<input aria-invalid="false" tabindex="0" class="ng-pristine ng-valid md-input ng-touched" id="username" name="username" data-ng-model="credentials.username" type="text">
Password:
<input id="password" class="ng-pristine ng-valid md-input ng-touched" type="password" data-ng-model="credentials.password" name="password" tabindex="0" aria-invalid="false"></input>
Button:
<button class="md-button md-default-theme" ng-transclude="" type="submit">
Clicking the button after filling in the user / password submits the following JSON (as a POST to https://use-manager.com/auth/signin)
{"username":"testing","password":"123"}
So if I were you, I would skip worrying about filling in the form and just do a JSON post to the signin page and take it from there.
There also appears to be an alternate login page here:
Login
That has a form...
<form method="post" action="login;jsessionid=5B1002FE04FDBDC6B0D00C0949EDE75F?service=https%3A%2F%2Fwww.use-manager.com%2Fauth%2Fsignin" name="login">
<input name="action" value="do_login" type="hidden">
<input name="urlparams" value="" type="hidden">
<div class="row">
<label>Username </label>
<input id="username" class="textbox" name="username" size="32" tabindex="1" accesskey="n">
</div>
<div class="row">
<label>Password </label>
<input class="textbox" id="password" name="password" size="32" tabindex="2" accesskey="p" type="password">
</div>
<div class="remember">
<input name="rememberMe" id="rememberMe" value="true" type="checkbox">
<label for="rememberMe">Remember Me</label>
</div>
<ul>
<li><a href="http://www.use-group.com/use-manager-support.html" target="_blank">Trouble logging in?</a></li>
<li><a href="javascript:forgotpwd()">Forgot Password?</a></li>
</ul>
<div class="row btn-row">
<input name="lt" value="_c316E91C9-E9CC-CB77-9C3F-1D94EC1A5AF3_kB493A777-3227-2796-6E52-DC8BA896B36D" type="hidden">
<input name="_eventId" value="submit" type="hidden">
<input class="submit" name="bttn1" onclick="callBldg(document.forms[0].username.value)" accesskey="l" value="LOGIN" tabindex="4" type="button">
<input class="submit" name="reset" accesskey="c" value="CLEAR" tabindex="5" type="reset">
</div>
</form>
Related Topics
How to Sort So That "Vitamin B12" Is Not in Front of "Vitamin B6"
Rubymine 6.0.2, Unable to Debug
How to Split String into 2 Parts After Certain Position
Assign to an Array and Replace Emerged Nil Values
Why Does Single '=' Work in 'If' Statement
How to Convert This Ruby String into an Array
Why Do I Get "Undefined Method 'Paginate'" Error in Production
Why Capypara + Rspect Tests Still Pass Even Though I Delete Application.Js File
Restart Rails Server Automatically After Every Change in Controllers
Testing HTML5 File Upload with Capybara/Selenium Webdriver - Ruby
How to Print All the Staged File Names Using Ruby Git Pre-Commit Hook
No Such File or Directory @ Rb_Sysopen for External Url/Rails 6.11/Ruby 3
How to Effectively Force Minitest to Run My Tests in Order
How to Serialize as Activesupport::Hashwithindifferentaccess Anymore