Getting Title and Meta Tags from External Website

Getting title and meta tags from external website

This is the way it should be:

function file_get_contents_curl($url)
{
$ch = curl_init();

curl_setopt($ch, CURLOPT_HEADER, 0);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$data = curl_exec($ch);
curl_close($ch);

return $data;
}

$html = file_get_contents_curl("http://example.com/");

//parsing begins here:
$doc = new DOMDocument();
@$doc->loadHTML($html);
$nodes = $doc->getElementsByTagName('title');

//get and display what you need:
$title = $nodes->item(0)->nodeValue;

$metas = $doc->getElementsByTagName('meta');

for ($i = 0; $i < $metas->length; $i++)
{
$meta = $metas->item($i);
if($meta->getAttribute('name') == 'description')
$description = $meta->getAttribute('content');
if($meta->getAttribute('name') == 'keywords')
$keywords = $meta->getAttribute('content');
}

echo "Title: $title". '<br/><br/>';
echo "Description: $description". '<br/><br/>';
echo "Keywords: $keywords";

get meta description , title and image from url like facebook link sharing

Why are you using regular expression for parsing the <meta> tags ?

PHP has an in-built function for parsing the meta information , it is called the get_meta_tags()

Illustration :

<?php
$tags = get_meta_tags('http://www.stackoverflow.com/');
echo "<pre>";
print_r($tags);

OUTPUT:

Array
(
[twitter:card] => summary
[twitter:domain] => stackoverflow.com
[og:type] => website
[og:image] => http://cdn.sstatic.net/stackoverflow/img/apple-touch-icon@2.png?v=fde65a5a78c6
[og:title] => Stack Overflow
[og:description] => Q&A for professional and enthusiast programmers
[og:url] => http://stackoverflow.com/
)

As you can see the title , image and description are being parsed which you really want.

Get meta description from external website

You can use the find method on the soup object and find the tags with specific attributes. Here we need to find the meta tag with either name attribute equal to og:description or description or property attribute equal to description.

# First get the meta description tag
description = soup.find('meta', attrs={'name':'og:description'}) or soup.find('meta', attrs={'property':'description'}) or soup.find('meta', attrs={'name':'description'})

# If description meta tag was found, then get the content attribute and save it to db entry
if description:
entry.description = description.get('content')


Related Topics



Leave a reply



Submit