Mechanize How to Get Current Url

mechanize how to get current url

next_page.uri.to_s 

See http://www.rubydoc.info/gems/mechanize/Mechanize/Page/Link#uri-instance_method and http://ruby-doc.org/stdlib-2.4.1/libdoc/uri/rdoc/URI.html

For testing purposes, I did the following in irb:

require 'mechanize'
@agent = Mechanize.new

page = @agent.get('http://news.ycombinator.com/news')
=> #<Mechanize::Page
{url #<URI::HTTP:0x00000001ad3198 URL:http://news.ycombinator.com/news>}
{meta_refresh}
{title "Hacker News"}
{iframes}
{frames}
{links
#<Mechanize::Page::Link "" "http://ycombinator.com">
#<Mechanize::Page::Link "Hacker News" "news">
#<Mechanize::Page::Link "new" "newest">
#<Mechanize::Page::Link "comments" "newcomments">
#<Mechanize::Page::Link "ask" "ask">
#<Mechanize::Page::Link "jobs" "jobs">
#<Mechanize::Page::Link "submit" "submit">
#<Mechanize::Page::Link "login" "newslogin?whence=%6e%65%77%73">
#<Mechanize::Page::Link "" "vote?for=3803568&dir=up&whence=%6e%65%77%73">
#<Mechanize::Page::Link
"Don’t Be Evil: How Google Screwed a Startup"
"http://blog.hatchlings.com/post/20171171127/dont-be-evil-how-google-screwed-a-startup">
#<Mechanize::Page::Link "mikeknoop" "user?id=mikeknoop">
#<Mechanize::Page::Link "64 comments" "item?id=3803568">
#<Mechanize::Page::Link "" "vote?for=3802515&dir=up&whence=%6e%65%77%73">
# Omitted for brevity...

next_page.uri
=> #<URI::HTTP:0x00000001fa7818 URL:http://news.ycombinator.com/news2>

next_page.uri.to_s
=> "http://news.ycombinator.com/news2"

How to find the current URL in python mechanize?

Well, it may not be very thorough, but still there is what you need:

import mechanize

br = mechanize.Browser()
br.open("http://www.example.com/")
# follow second link with element text matching regular expression
response1 = br.follow_link(text_regex=r"cheese\s*shop", nr=1)

print response1.geturl()

As a side-note, when I'm looking for method like that and I don't find them in the docs, I usually open an IPython shell and I play with the autocompletion to see if there is some method that seems nice.

How to get current URL from Mechanize in Python?

br.geturl() should do it. Using httpbin.org's redirect endpoint to test:

br = mechanize.Browser()
url = 'http://httpbin.org/redirect-to?url=http%3A%2F%2Fstackoverflow.com'
br.open( url )

>>> print br.geturl()
http://stackoverflow.com

How to get the current URL for a HTML page

I'm assuming you're using the open_uri_redirections gem because :allow_redirections is not necessary in Ruby 2.4+.

Save the result of OpenURI's open:

require 'open-uri'
r = open('http://www.google.com/gmail')
r.base_uri
# #<URI::HTTPS https://accounts.google.com/ServiceLogin?service=mail&passive=true&rm=false&continue=https://mail.google.com/mail/&ss=1&scc=1<mpl=default<mplcache=2&emr=1&osid=1#>
page = Nokogiri::HTML(r)

Python Mechanize, how to get URL parameters

from urllib.parse import urlparse

parsed = urlparse(url)
print(parsed)

The output:

ParseResult(scheme='https', netloc='example.com', path='/something.php', params='', query='sid=123456789', fragment='')

Then, you can access:

print(parsed.query)

The output:

sid=123456789

Then, you can extract:

sid = parsed.query.split('sid=')[-1]
print(sid)

The output:

123456789


Related Topics



Leave a reply



Submit