Clicking link using beautifulsoup in python
BeautifulSoup
is an HTML parser.
Further discussion really depends on the concrete situation you are in and the complexity of the particular web page.
If you need to interact with a web-page: submit forms, click buttons, scroll etc - you need to use a tool that utilizes a real browser, like selenium
.
In certain situations, for example, if there is no javascript involved in submitting a form, mechanize
would also work for you.
And, sometimes you can handle it by simply following the link with urllib2
or requests
.
Clicking links with Python BeautifulSoup
So with help from the comments, I decided to just use urlopen like this:
from bs4 import BeautifulSoup
import urllib.request
import re
def getLinks(url):
html_page = urllib.request.urlopen(url)
soup = BeautifulSoup(html_page, "html.parser")
links = []
for link in soup.findAll('a', attrs={'href': re.compile("^http://")}):
links.append(link.get('href'))
return links
anchors = getLinks("http://madisonmemorial.org/")
for anchor in anchors:
happens = urllib.request.urlopen(anchor)
if happens.getcode() == "404":
# Do stuff
# Click on links and return responses
countMe = len(anchors)
for anchor in anchors:
i = getLinks(anchor)
countMe += len(i)
happens = urllib.request.urlopen(i)
if happens.getcode() == "404":
# Do some stuff
print(countMe)
I've got my own arguments in the if statements
How to click/use a link parsed from Beautiful Soup in python
You can use selenium to click the links, you can see how to do this here. Or, after fetching the page with requests (forget urllib) and extracting urls with bs4, you can requests.get('your_example_url')
it and fetch the results again.
How to Click on a Hidden Link with BeautifulSoup or Selenium
used f12=> network tab and found this page so here u go
from bs4 import BeautifulSoup
import requests
import datetime
BASE_API_URL='https://www.theice.com'
r=requests.get(f'https://www.theice.com/marginrates/ClearUSMarginParameterFiles.shtml?getParameterFileTable&category=Current&_={int(datetime.datetime.now().timestamp()*1000)}')
soup=BeautifulSoup(r.content,features='lxml')
margin_scanning_link=BASE_API_URL+soup.find_all("a", string="Margin Scanning")[0].attrs['href']
margin_scanning_file=requests.get(margin_scanning_link)
Click on link using Beautifulsoup/Selenium
A solution with beautifulsoup that gets all selling prices and seller names from the View All Offers-tab could look like this:
from bs4 import BeautifulSoup
from requests import get
url = 'https://www.noon.com/uae-en/iphone-11-with-facetime-black-128gb-4g-lte-international-specs/N29884715A/p?o=eaf72ceb0dd3bc9f'
resp = get(url).text
soup = BeautifulSoup(resp, 'lxml')
for offer in soup.find_all("li", class_="item"):
print(offer.find("span", class_="sellingPrice").find("span", class_="value").text)
print(offer.find("div", class_="sellerDetails").strong.text)
A solution in Scrapy could look like that:
import scrapy
class noonSpider(scrapy.Spider):
name = "noon"
start_urls = ['https://www.noon.com/uae-en/iphone-11-with-facetime-black-128gb-4g-lte-international-specs/N29884715A/p?o=eaf72ceb0dd3bc9f/p?o=b478235d26032e5a']
def parse(self, response):
yield {
'sellingPrice': response.css('.offersList .sellingPrice .value::text').getall(),
'seller': response.css('.offersList .sellerDetails strong::text').getall(),
}
Is it possible to click on a link using Beautiful soup?
Yes, it is possible, but the approach is different. You will need to understand the get/post request that occurs on clicking that link manually in a browser. That you can do using Network tab of Developer Console of browser. You may also need to maintain session, i.e., receiving, storing and sending cookies. You can use Requests for the same.
Python Beautifulsoup - click load more button
Beautifulsoup doesn't have a click function. You could do this through Selenium, which does. There is another option which allows you to just use Beautifulsoup.
When you click the button the url changes to https://reelgood.com/movies/source/netflix?offset=50.
The offset increments by 50 up to 3750 as far as I can tell.
https://reelgood.com/movies/source/netflix?offset=3750 however doesn't show you the whole table, just the last page. So you could loop through the pages and collect all titles on that page and append it to your list.
something like:
for i in range(0, 3800, 50):
URL= "https://reelgood.com/movies/source/netflix?offset=" + str(i)
page = requests.get(URL)
soup = BeautifulSoup(page.content, "html.parser")
f = open("C:/Downloaders/test/Scrape/movies_netflix.txt", "w")
for link in soup.select('[itemprop=itemListElement] [itemprop=url]'):
data = link.get('content')
f.write(data)
f.write("\n")
You might also consider removing your for loop and append all movies on a page to list or something and then write te whole list to a file in the end. Otherwise you would have to loop 76*50 times, which could take a long time.
Related Topics
Python Format Size Application (Converting B to Kb, Mb, Gb, Tb)
How to Remove All Characters Before a Specific Character in Python
How to Get All Users in a Telegram Channel Using Telethon
How to Convert Number 1 to a Boolean in Python
Find the Index of a Value in a 2D Array
Find Value in Dictionary Using Regex in Python
Valueerror: Time Data Does Not Match Format When Parsing a Date
How to Plot Predicted Values VS the True Value
Tkinter: How to Use Threads to Preventing Main Event Loop from "Freezing"
How to Limit a Number to Be Within a Specified Range (Python)
Manual Function to Sort 3 Numbers, Lowest to Highest
I Want to Reshape 2D Array into 3D Array
Convert Images from [-1; 1] to [0; 255]