Clicking link with JavaScript in Mechanize
That's a javascript link. Mechanize will not be able to click it, since it does not evaluate javascript. Sorry!
Try to find out what happens in your browser when you click that link. Does it create a POST or GET request? What are the parameters that are sent to the server. Once you know that, you can emulate the same action in your Mechanize script. Chrome dev tools / Firebug will help out.
If that doesn't work, try switching to a library that supports javascript evaluation. I've used watir-webdriver to great success, but you could also try out phantomjs, casperjs, pjscrape, or other tools
Click javascript tab using mechanize and ruby
I just ran this in my console and you're getting this error
NoMethodError: undefined method `add_field!' for nil:NilClass
because this line returns nil
form = page.form("aspnetForm.add_field!('__EVENTTARGET','')")
Change it to this and it will fix that current error.
form = page.form("aspnetForm")
Mechanize and Python, clicking href=javascript:void(0); links and getting the response back
First of all, I would still stick to selenium since this is a quite "javascript-heavy" website. Note that you can use a headless browser (PhantomJS
or with a virtual display) if needed.
The idea here would be to paginate by 100 rows per page, click on the ">>" link until it is not present on page, which would mean we've hit the last page and there are no more results to process. In order to make the solution reliable we need to use Explicit Waits: every time we proceed to a next page - wait for invisibility of the loading spinner.
Working implementation:
# -*- coding: utf-8 -*-
from selenium.common.exceptions import NoSuchElementException
from selenium.webdriver.common.by import By
from selenium import webdriver
from selenium.webdriver.support.select import Select
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
driver = webdriver.Firefox()
driver.maximize_window()
driver.get('https://polon.nauka.gov.pl/opi/aa/drh/zestawienie?execution=e1s1')
wait = WebDriverWait(driver, 30)
# paginate by 100
select = Select(driver.find_element_by_id("drhPageForm:drhPageTable:j_idt211:j_idt214:j_idt220"))
select.select_by_visible_text("100")
while True:
# wait until there is no loading spinner
wait.until(EC.invisibility_of_element_located((By.ID, "loadingPopup_content_scroller")))
current_page = driver.find_element_by_class_name("rf-ds-act").text
print("Current page: %d" % current_page)
# TODO: collect the results
# proceed to the next page
try:
next_page = driver.find_element_by_link_text(u"»")
next_page.click()
except NoSuchElementException:
break
Follow a javascript link with mechanize and python
When I needed to do something similar, I looked at the links I was trying to follow.
Some of them were static links generated with javascript. They were predictable/consistent enough that I could manually generate a list before hand.
Others were just constructed URLs with parameters. These too could be analyzed before hand and generated python-side and passed as a request instead of a "click on this link."
If you need to actually execute the javascript, you could run a PyV8 + Mechanize hybrid. I've been playing with this a bit and it seems pretty cool. PyV8 bridges Python with the V8 Javascript engine allowing you to create JS environments and execute arbitrary code. It does a great job going back and forth between the two languages.
I don't have any sample code, but one of these 3 solutions will work for you :) Good luck!
going to a javascript link with mechanize-firefox
Without the page and its HTML and JS one can only guess. Note that the follow_link()
methods don't work with JS links. The method below does, but of course I cannot test without the page.
Probably the best bet is to get link(s) as DOM object(s) for the click
method
use WWW::Mechanize::Firefox;
# Get to your page with the link(s)
my $link = find_link_dom( text_regex => 'abc' ); # Or use find_all_links_dom()
$link->click();
# $mech->click( { dom => $link } ) # works as well
There are also text
and text_contains
relevant options (instead of text_regex
), along with a number of others. Note that click
method will wait, on a list of events, before returning. See for example this recent post. This is critical for pages that take longer to complete.
See docs for find_link_dom()
and click
methods. They aren't very detailed or rich in examples but do provide enough to play with and figure it out.
If you need to interrogate links use find_all_links_dom()
, which returns an array or a reference to array (depending on context) of Firefox's DOM as MozRepl::RemoteObject
instances.
my @links_dom = find_all_links_dom( text_contains => 'abc' );
# Example from docs for find_link_dom()
for my $ln (@links_dom) {
print $ln->{innerHTML} . "\n"
}
See the page for MozRepl::RemoteObject to see what you can do with it. If you only need to find out which link to click the options for find_link_dom()
should be sifficient.
This has been tested only with a toy page, that uses __doPostBack
link, with <span>
in the link.
Related Topics
Rails Active Record: Find in Conjunction with :Order and :Group
How Can One Set Property Values When Initializing an Object in Ruby
Convert Timestamp Timezone in Logstash for Output Index Name
Is There a Bug in Ruby Lookbehind Assertions (1.9/2.0)
Convert Array-Of-Hashes to a Hash-Of-Hashes, Indexed by an Attribute of the Hashes
How to Run Rake with --Trace Within Capistrano
How to Group Numbers into Different Buckets in Ruby
Devise Raises Error with Rails 4.2 Upgrade
How to Put a Delay on a Loop in Ruby
Regex to Validate String Having Only Characters (Not Special Characters), Blank Spaces and Numbers
Remove Double Quotes from String
Validates_Inclusion_Of No Longer Working the Same in Rails 4.1
How to Use Mongodb Ruby Driver to Do a "Group" (Group By)