How can I get href links from HTML using Python?
Try with Beautifulsoup:
from BeautifulSoup import BeautifulSoup
import urllib2
import re
html_page = urllib2.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page)
for link in soup.findAll('a'):
print link.get('href')
In case you just want links starting with http://
, you should use:
soup.findAll('a', attrs={'href': re.compile("^http://")})
In Python 3 with BS4 it should be:
from bs4 import BeautifulSoup
import urllib.request
html_page = urllib.request.urlopen("http://www.yourwebsite.com")
soup = BeautifulSoup(html_page, "html.parser")
for link in soup.findAll('a'):
print(link.get('href'))
how to get href link by text in Python
You might want to try BeautifulSoup
.
For example:
from bs4 import BeautifulSoup
sample_html = """
<a href="https://www.cnbeta.com/articles/science/1062069.htm"><strong>阅读全文</strong></a>
<a href="https://www.cnbeta.com/articles/science/1062068.htm"><strong>RANDOM TEXT!</strong></a>
"""
soup = BeautifulSoup(sample_html, "html.parser").find_all(lambda t: t.name == "a" and t.text.startswith("阅"))
print([a["href"] for a in soup])
Output:
['https://www.cnbeta.com/articles/science/1062069.htm']
Get href links from a tag
Looking at your HTML code, you can use CSS selector a.product-item
. This will select all <a>
tags with class="product-item"
:
from bs4 import BeautifulSoup
html_text = """
<div class="row product-layout-category product-layout-list">
<div class="product-col wow fadeIn animated" style="visibility: visible;">
<a href="the link I want" class="product-item">
<div class="product-item-image">
<img data-src="link to an image" alt="name of the product" title="name of the product" class="img-responsive lazy" src="link to an image">
</div>
<div class="product-item-desc">
<p><span><strong>brand</strong></span></p>
<p><span class="font-size-16">name of the product</span></p>
<p class="product-item-price>
<span>product price</span></p>
</div>
</a>
</div>
"""
soup = BeautifulSoup(html_text, "html.parser")
for link in soup.select("a.product-item"):
print(link.get("href")) # or link["href"]
Prints:
the link I want
Python, Beautifullsoup - get href link
You have to pull out the anchor tag <a>
that contains the href:
import requests
from bs4 import BeautifulSoup
page = "https://mojmikolow.pl/informacje,0.html"
page = requests.get(page).content
data_entries = BeautifulSoup(page, "html.parser").find_all("section", {"class": "news"})
for data_entrie in data_entries:
link_tag = data_entrie.find('a',href=True)
get_link = link_tag.get('href')
print(get_link)
Scraping using Python Beautifulsoup getting the url of href that is a link
Similar to what's described here. I believe you're actually going to need some kind of webdriver automator (Selenium, etc.) to simulate the hover-over and get the data.
Get href link with selenium (python)
i
in your case is a web element
, and to extract the .text
, you should not just print i
, it should be print(i.text)
.
Moreover if you want to extract the href
off of the a tag
, then you should use .get_attribute('href')
Secondly, I think you should use CSS_SELECTOR
div.search-content-cards
instead of CLASS_NAME
Also a tag is descendant.
so your effective code should look like this:
el = WebDriverWait(driver,10).until(EC.presence_of_element_located((By.CSS_SELECTOR, "div.search-content-cards")))
el_hrefs = el.find_elements_by_xpath(".//descendant::a[@href]")
for i in el_hrefs:
print(i.get_attribute('href'))
How can i extract Href and title from this HTML
Select your elements more specific e.g. with css selectors
and iterate over your ResultSet
to get the attributes of each of them as list of tuples
:
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href][title]')]
Example
from bs4 import BeautifulSoup
html = '''
<h3 class="foo1">
<a href="someLink" title="someTitle">SomeTitle</a>
</h3>
<h3 class="foo1">
<a href="OtherLink" title="OtherTitle">OtherTitle</a>
</h3>
'''
soup = BeautifulSoup(html)
[(a.get('title'),a.get('href')) for a in soup.select('h3 a[href]')]
Output
[('someTitle', 'someLink'), ('OtherTitle', 'OtherLink')]
Related Topics
In Python, How to Read the Exif Data for an Image
How Do Rpy2, Pyrserve and Pyper Compare
Negative Integer Division Surprising Result
Installing Python Packages from Local File System Folder to Virtualenv with Pip
Pip Install Access Denied on Windows
How to Pretty Print Nested Dictionaries
":=" Syntax and Assignment Expressions: What and Why
Iterate Over Model Instance Field Names and Values in Template
How to Replace Text in a String Column of a Pandas Dataframe
What Does a Python Process Return Code -9 Mean
Placing Custom Images in a Plot Window--As Custom Data Markers or to Annotate Those Markers
Parsing Date/Time String with Timezone Abbreviated Name in Python
Replace Values in List Using Python
Python Setup.Py Develop VS Install
Python: Get a Frequency Count Based on Two Columns (Variables) in Pandas Dataframe Some Row Appers
Efficient Way to Apply Multiple Filters to Pandas Dataframe or Series