Clicking Links With Python Beautifulsoup

Clicking link using beautifulsoup in python

BeautifulSoup is an HTML parser.

Further discussion really depends on the concrete situation you are in and the complexity of the particular web page.

If you need to interact with a web-page: submit forms, click buttons, scroll etc - you need to use a tool that utilizes a real browser, like selenium.

In certain situations, for example, if there is no javascript involved in submitting a form, mechanize would also work for you.

And, sometimes you can handle it by simply following the link with urllib2 or requests.

Clicking links with Python BeautifulSoup

So with help from the comments, I decided to just use urlopen like this:

from bs4 import BeautifulSoup
import urllib.request
import re

def getLinks(url):
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page, "html.parser")
links = []

for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
return links

anchors = getLinks("http://madisonmemorial.org/")
for anchor in anchors:
happens = urllib.request.urlopen(anchor)
if happens.getcode() == "404":
# Do stuff
# Click on links and return responses
countMe = len(anchors)
for anchor in anchors:
i = getLinks(anchor)
countMe += len(i)
happens = urllib.request.urlopen(i)
if happens.getcode() == "404":
# Do some stuff

print(countMe)

I've got my own arguments in the if statements

How to click/use a link parsed from Beautiful Soup in python

You can use selenium to click the links, you can see how to do this here. Or, after fetching the page with requests (forget urllib) and extracting urls with bs4, you can requests.get('your_example_url') it and fetch the results again.

How to Click on a Hidden Link with BeautifulSoup or Selenium

used f12=> network tab and found this page so here u go

from bs4 import BeautifulSoup
import requests
import datetime
BASE_API_URL='https://www.theice.com'
r=requests.get(f'https://www.theice.com/marginrates/ClearUSMarginParameterFiles.shtml?getParameterFileTable&category=Current&_={int(datetime.datetime.now().timestamp()*1000)}')
soup=BeautifulSoup(r.content,features='lxml')
margin_scanning_link=BASE_API_URL+soup.find_all("a", string="Margin Scanning")[0].attrs['href']
margin_scanning_file=requests.get(margin_scanning_link)

Click on link using Beautifulsoup/Selenium

A solution with beautifulsoup that gets all selling prices and seller names from the View All Offers-tab could look like this:

from bs4 import BeautifulSoup
from requests import get

url = 'https://www.noon.com/uae-en/iphone-11-with-facetime-black-128gb-4g-lte-international-specs/N29884715A/p?o=eaf72ceb0dd3bc9f'
resp = get(url).text
soup = BeautifulSoup(resp, 'lxml')
for offer in soup.find_all("li", class_="item"):
print(offer.find("span", class_="sellingPrice").find("span", class_="value").text)
print(offer.find("div", class_="sellerDetails").strong.text)

A solution in Scrapy could look like that:

import scrapy


class noonSpider(scrapy.Spider):
name = "noon"
start_urls = ['https://www.noon.com/uae-en/iphone-11-with-facetime-black-128gb-4g-lte-international-specs/N29884715A/p?o=eaf72ceb0dd3bc9f/p?o=b478235d26032e5a']

def parse(self, response):
yield {
'sellingPrice': response.css('.offersList .sellingPrice .value::text').getall(),
'seller': response.css('.offersList .sellerDetails strong::text').getall(),
}

Is it possible to click on a link using Beautiful soup?

Yes, it is possible, but the approach is different. You will need to understand the get/post request that occurs on clicking that link manually in a browser. That you can do using Network tab of Developer Console of browser. You may also need to maintain session, i.e., receiving, storing and sending cookies. You can use Requests for the same.

Python Beautifulsoup - click load more button

Beautifulsoup doesn't have a click function. You could do this through Selenium, which does. There is another option which allows you to just use Beautifulsoup.

When you click the button the url changes to https://reelgood.com/movies/source/netflix?offset=50.

The offset increments by 50 up to 3750 as far as I can tell.

https://reelgood.com/movies/source/netflix?offset=3750 however doesn't show you the whole table, just the last page. So you could loop through the pages and collect all titles on that page and append it to your list.

something like:

for i in range(0, 3800, 50):
URL= "https://reelgood.com/movies/source/netflix?offset=" + str(i)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")

f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
data = link.get('content')
f.write(data)
f.write("\n")

You might also consider removing your for loop and append all movies on a page to list or something and then write te whole list to a file in the end. Otherwise you would have to loop 76*50 times, which could take a long time.



Related Topics



Leave a reply



Submit