How to extract http links from a paragraph and store them in a array on php
$text = 'Lorem ipsum http://thesite.com dolor sit amet, consectetur adipisicing elit, sed do eiusmod tempor
incididunt https://www.thesite.com ut labore et dolore magna aliqua. Ut http://www.thesite.com enim ad minim veniam,';
$pattern = '!(https?://[^\s]+)!'; // refine this for better/more specific results
if (preg_match_all($pattern, $text, $matches)) {
list(, $links) = ($matches);
print_r($links);
}
Extract URL's from a string using PHP
REGEX is the answer for your problem. Taking the Answer of Object Manipulator.. all it's missing is to exclude "commas", so you can try this code that excludes them and gives 3 separated URL's as output:
$string = "The text you want to filter goes here. http://google.com, https://www.youtube.com/watch?v=K_m7NEDMrV0,https://instagram.com/hellow/";
preg_match_all('#\bhttps?://[^,\s()<>]+(?:\([\w\d]+\)|([^,[:punct:]\s]|/))#', $string, $match);
echo "<pre>";
print_r($match[0]);
echo "</pre>";
and the output is
Array
(
[0] => http://google.com
[1] => https://www.youtube.com/watch?v=K_m7NEDMrV0
[2] => https://instagram.com/hellow/
)
Find links in page and run it through custom function
You can use preg_replace_callback instead of preg_replace http://nz.php.net/manual/en/function.preg-replace-callback.php
function link_it($text)
{
$text= preg_replace_callback("/(^|[\n ])([\w]*?)((ht|f)tp(s)?:\/\/[\w]+[^ \,\"\n\r\t<]*)/is", 'shorturl2full', $text);
$text= preg_replace_callback("/(^|[\n ])([\w]*?)((www|ftp)\.[^ \,\"\t\n\r<]*)/is", 'shorturl2full', $text);
$text= preg_replace_callback("/(^|[\n ])([a-z0-9&\-_\.]+?)@([\w\-]+\.([\w\-\.]+)+)/i", 'shorturl2full', $text);
return($text);
}
function shorturl2full($url)
{
$fullLink = 'FULLLINK';
// $url[0] is the complete match
//... you code to find the full link
return '<a href="' . $url[0] . '">' . $fullLink . '</a>';
}
Hope this helps
Finding urls from text string via php and regex?
$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);
Find links in string with PHP. Differ from normal and youtube links
First of all, ditch eregi
. It's deprecated and will disappear soon.
Then, doing this in just one pass is maybe a stretch too far. I think you'll be better off splitting this into three phases.
Phase 1 runs a regex search over your input, finding everything that looks like a link, and storing it in a list.
Phase 2 iterates over the list, checking whether a link goes to youtube (parse_url
is tremendously useful for this), and putting a suitable replacement into a second list.
Phase 3: you now have two lists, one containing the original matches, one containing the desired replacements. Run str_replace over your original text, providing the match list for the search parameter and the replacement list for the replacements.
There are several advantages to this approach:
- The regular expression for extracting links can be kept relatively simple, since it doesn't have to take special hostnames into account
- It is easier to debug; you can dump the search and replace arrays prior to phase 3, and see if they contain what you expect
- Because you perform all replacements in one go, you avoid problems with overlapping matches or replacing a piece of already-replaced text (after all, the replaced text still contains a URL, and you don't want to replace that again)
Get ul li a string values and store them in a variable or array php
Try this
$html = '<div class="coursesListed">
<ul>
<li><a href="#"><h3>Item one</h3></a></li>
<li><a href="#"><h3>item two</h3></a></li>
<li><a href="#"><h3>Item three</h3></a></li>
</ul>
</div>';
$doc = new DOMDocument();
$doc->loadHTML($html);
$liList = $doc->getElementsByTagName('li');
$liValues = array();
foreach ($liList as $li) {
$liValues[] = $li->nodeValue;
}
var_dump($liValues);
Web Crawler not following page's links
You should (as ususal) first of all make up your mind what you're actually doing.
As you outline in your question you're doing a text-search for URL patterns of the HTTP protocol. A common regex normally includes the https:
URI scheme as well:
~https?://\S*~
That is everything until the first whitepspace. this normally does the job for detecting HTTP URLs of a wider range within a string. If you need something more advanced see the Stackover Q&A about making links of texts clickable:
- How to match URIs in text?
- How to extract http links from a paragraph and store them in a array on php
This still will not solve all of your crawler problems. For two reasons:
- Character encoding: If you want to properly do that, you need to know the correct character encoding of the string and make the regular expression fitting for it.
- That is text. Websites not only consist of text but also of HTML which carries its own semantics.
So actually doing text-analysis alone is not enough. You also need to parse HTML. That means you need to take the Base URI and resolve each other URI inside the document against it to obtain the list of all absolute links in that document.
You find this outlined in the following whitepaper:
- 5. Reference Resolution in RFC3986: Uniform Resource Identifier (URI): Generic Syntax
For PHP the two most stable components to work with for this are:
DOMDocument
- A PHP extension to parse XML and HTML documents. Here you are looking for parsing HTML documents naturally.Net_Url2
- A PEAR extension to deal with URLs including RFC3986 conform reference resolution (the differences to the previous version you can safely ignore, the standard is pretty stable as the PHP library is, two minor bugs in very narrow and specific cases are still open but have patches).
Related Topics
Can't Connect to Postgresql with PHP Pg_Connect()
What Is the Best PHP Dom 2 Array Function
Converting a Byte Array into an Image Using PHP and HTML
Best Method for Converting a PHP Array to JavaScript
PHP Array_Sum on Multi Dimensional Array
How to Get Http Url of File Uploaded to Ftp Server
Sqlite Database Hosted on Heroku Getting Automatically Reset
How Do We Implement Custom API-Only Authentication in Laravel
How to Attach an Image Using PHPmailer
Woocommerce - Overriding Billing State and Post Code on Existing Checkout Fields
PHP and Regexp to Accept Only Greek Characters in Form
Is a Blob Converted Using the Current/Default Charset in MySQL
Jquery UI Saving Sortable List
Php, Why Do You Escape My Quotes