How to use CSS selectors to retrieve specific links lying in some class using BeautifulSoup?
The page is not the most friendly in the use of classes and markup, but even so your CSS selector is too specific to be useful here.
If you want Upcoming Events, you want just the first <div class="events-horizontal">
, then just grab the <div class="title"><a href="..."></div>
tags, so the links on titles:
upcoming_events_div = soup.select_one('div.events-horizontal')
for link in upcoming_events_div.select('div.title a[href]'):
print(link['href'])
Note that you should not use r.text
; use r.content
and leave decoding to Unicode to BeautifulSoup. See Encoding issue of a character in utf-8
How to use CSS selectors to retrieve specific links using BeautifulSoup?
You will need to make some assumptions about what is most likely to remain constant, and then review over time. For example, I might assume you want the 3rd column td
's child a
tag href
, from the table
which is the first following the div
with containing the string Catálogo Actualizaciones
. One css pattern for that would be as follows:
import requests
from bs4 import BeautifulSoup as bs
r = requests.get('https://alfabeta.surge.sh/')
soup = bs(r.text, 'lxml')
print(soup.select_one('div:-soup-contains("Catálogo Actualizaciones") ~ table td:nth-child(3) > a')['href'])
CSS selectors to be used for scraping specific links
First of all, that page requires a city selection to be made (in a cookie). Use a Session object to handle this:
s = requests.Session()
s.post('http://kiascenehai.pk/select_city/submit_city', data={'city': 'Lahore'})
response = s.get('http://kiascenehai.pk/')
Now the response gets the actual page content, not redirected to the city selection page.
Next, keep your CSS selector no larger than needed. In this page there isn't much to go on as it uses a grid layout, so we first need to zoom in on the right rows:
upcoming_events_header = soup.find('div', class_='featured-event')
upcoming_events_row = upcoming_events_header.find_next(class_='row')
for link in upcoming_events_row.select('h2 a[href]'):
print link['href']
Choosing the appropriate tag to pass into the select method of BeautifulSoup
If you do 'a href' you haven't specified a div class, so it's going to get all instances of a href, which is going to include links to stuff like maps and drive etc. In the code you cite, you missed the "r" div class
<div data-hveid=.....>
<div class="rc">
<div class="r">
<a href="https://www.python.org/".....>
<h3 class="LC20lb">Welcome to Python.org</h3>
So soup.select('.r a') is getting all the a tags in the "r" div class (which is the search results), rather than all instances of a href tags.
Hope this answers your question!
How to check if a soup contains an element?
You can try select_one
instead of find
. Something like this.
soup.select_one('details[data-level="2"] summary.section-heading h2#English')
The result will be
<h2 id="English">English</h2>
Get value from first span tag in beautifulsoup
try using a css selector
,
from bs4 import BeautifulSoup
soup = BeautifulSoup(html, "html.parser")
print(soup.select_one("td > span").text)
$39,465,077,974.88
Extract data-content from span tag in BeautifulSoup
try using css selector
,
soup.select_one("li[class='IDENTIFIER'] > p > span")['data-content']
Related Topics
Importerror: Cannot Import Name X
Python Gdal 2.1 Installation on Ubuntu 16.04
Simulate Mouse Clicks on Python
Pip Install Unable to Find Ffi.H Even Though It Recognizes Libffi
Running a Python Script Using Cron
What Are the Tkinter Events for Horizontal Edge Scrolling (In Linux)
How to Run Python Script on Terminal (Ubuntu)
How to Run Python Script on Usb Flash-Drive Insertion
Can't Build Matplotlib (Png Package Issue)
Python3 Cgi Https Server Fails on Unix
Get Last N Lines of a File, Similar to Tail
Extracting Data from HTML Table
How to Retrieve the Process Start Time (Or Uptime) in Python
Anaconda: Disable Prompt Change
Datastax Python Cassandra Driver Build Fails on Ubuntu
Error: Could Not Build Wheels for Glpk Which Use Pep 517 and Cannot Be Installed Directly
Python Tkinter: Attach Scrollbar to Listbox as Opposed to Window
Using Pyinotify to Watch for File Creation, But Waiting for It to Be Completely Written to Disk