How to Use CSS Selectors to Retrieve Specific Links Lying in Some Class Using Beautifulsoup

How to use CSS selectors to retrieve specific links lying in some class using BeautifulSoup?

The page is not the most friendly in the use of classes and markup, but even so your CSS selector is too specific to be useful here.

If you want Upcoming Events, you want just the first <div class="events-horizontal">, then just grab the <div class="title"><a href="..."></div> tags, so the links on titles:

upcoming_events_div = soup.select_one('div.events-horizontal')
for link in upcoming_events_div.select('div.title a[href]'):
print(link['href'])

Note that you should not use r.text; use r.content and leave decoding to Unicode to BeautifulSoup. See Encoding issue of a character in utf-8

How to use CSS selectors to retrieve specific links using BeautifulSoup?

You will need to make some assumptions about what is most likely to remain constant, and then review over time. For example, I might assume you want the 3rd column td's child a tag href, from the table which is the first following the div with containing the string Catálogo Actualizaciones. One css pattern for that would be as follows:

import requests
from bs4 import BeautifulSoup as bs

r = requests.get('https://alfabeta.surge.sh/')
soup = bs(r.text, 'lxml')
print(soup.select_one('div:-soup-contains("Catálogo Actualizaciones") ~ table td:nth-child(3) > a')['href'])

CSS selectors to be used for scraping specific links

First of all, that page requires a city selection to be made (in a cookie). Use a Session object to handle this:

s = requests.Session()
s.post('http://kiascenehai.pk/select_city/submit_city', data={'city': 'Lahore'})
response = s.get('http://kiascenehai.pk/')

Now the response gets the actual page content, not redirected to the city selection page.

Next, keep your CSS selector no larger than needed. In this page there isn't much to go on as it uses a grid layout, so we first need to zoom in on the right rows:

upcoming_events_header = soup.find('div', class_='featured-event')
upcoming_events_row = upcoming_events_header.find_next(class_='row')

for link in upcoming_events_row.select('h2 a[href]'):
print link['href']

Choosing the appropriate tag to pass into the select method of BeautifulSoup

If you do 'a href' you haven't specified a div class, so it's going to get all instances of a href, which is going to include links to stuff like maps and drive etc. In the code you cite, you missed the "r" div class

    <div data-hveid=.....>
<div class="rc">
<div class="r">
<a href="https://www.python.org/".....>
<h3 class="LC20lb">Welcome to Python.org</h3>

So soup.select('.r a') is getting all the a tags in the "r" div class (which is the search results), rather than all instances of a href tags.

Hope this answers your question!

How to check if a soup contains an element?

You can try select_one instead of find. Something like this.

soup.select_one('details[data-level="2"] summary.section-heading h2#English')

The result will be

<h2 id="English">English</h2>

Get value from first span tag in beautifulsoup

try using a css selector,

from bs4 import BeautifulSoup

soup = BeautifulSoup(html, "html.parser")

print(soup.select_one("td > span").text)


$39,465,077,974.88

Extract data-content from span tag in BeautifulSoup

try using css selector,

soup.select_one("li[class='IDENTIFIER'] > p > span")['data-content']


Related Topics



Leave a reply



Submit