Detecting a Url Using Preg_Match? Without Http:// in the String

Detecting a url using preg_match? without http:// in the string

You want something like:

%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i

this is using the | to match either http:// or www at the beginning. I changed the delimiter to % to avoid clashing with the |

Regular expression pattern to match URL with or without http://www

For matching all kinds of URLs, the following code should work:

<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>

Then, the correct way to check against the regex is as follows:

<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);

if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>

Courtesy: Comments made by splattermania in the PHP manual: preg_match

RegEx Demo in regex101

Php preg_match using URL as regex

preg_grep() provides a shorter line of code, but because the substring to be matched doesn't appear to have any variable characters in it, best practice would indicate strpos() is better suited.

Code: (Demo)

$urls=[
'http://www.example.com/eng-gb/products/test-1',
'http://www.example.com/eng-gb/badproducts/test-2',
'http://www.example.com/eng-gb/products/test-3',
'http://www.example.com/eng-gb/badproducts/products/test-4',
'http://www.example.com/products/test-5',
'http://www.example.com/eng-gb/about-us',
];

var_export(preg_grep('~^http://www.example\.com/eng-gb/products/[^/]*$~',$urls));
echo "\n\n";
var_export(array_filter($urls,function($v){return strpos($v,'http://www.example.com/eng-gb/products/')===0;}));

Output:

array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)

array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)

Some notes:

Using preg_grep():

  • Use a non-slash pattern delimiter so that you don't have to escape all of the slashes inside the pattern.
  • Escape the dot at .com.
  • Write the full domain and directory path with start and end anchors for tightest validation.
  • Use a negated character class near the end of the pattern to ensure that no additional directories are added (unless of course you wish to include all subdirectories).
  • My pattern will match a url that ends with /products/ but not /products. This is in accordance with the details in your question.

Using strpos():

  • Checking for strpos()===0 means that the substring must be found at the start of the string.
  • This will allow any trailing characters at the end of the string.

preg_match_all - regex to find full urls in string

Well. Finally I got it:

The final regex code is:

$regex = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";

It works.

preg_match not working when wanting to detect multiple urls

So, an even better solution here would be to split the incoming string into an array of strings between each url segment and then insert [$i] between consecutive non-url components.

# better solution, perform a split.
function process_line2($input) {
$regex_url = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
# split the incoming string into an array of non-url segments
# preg_split does not trim leading or trailing empty segments
$non_url_segments = preg_split($regex_url, $input, -1);

# inside the array, combine each successive non-url segment
# with the next index
$out = [];
$count = count($non_url_segments);
for ($i = 0; $i < $count; $i++) {
# add the segment
array_push($out, $non_url_segments[$i]);
# add its index surrounded by brackets on all segments but the last one
if ($i < $count -1) {
array_push($out, '[' . $i . ']');
}
}
# join strings with no whitespace
return implode('', $out);
}

preg_match only returns the first result, so it doesn't give you the number of urls matching your regular expression. You need to extract the first element of the array returned by preg_match_all.

The second error is that you are not using the limit argument of preg_replace, so all of your urls are getting replaced at the same time.

From the documentation for preg_replace: http://php.net/manual/en/function.preg-replace.php

The parameters are

mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )

in particular the limit parameter defaults to -1 (no limit)

limit: The maximum possible replacements for each pattern in each subject string. Defaults to -1 (no limit).

You need to set an explicit limit of 1.

Elaborating a bit on replacing preg_match with preg_match_all, you need to extract the [0] component from it since preg_match_all returns an array of arrays. For example:

array(1) {
[0]=>
array(2) {
[0]=>
string(23) "https://www.google.com/"
[1]=>
string(25) "http://stackoverflow.com/"
}
}

Here is an example with the fixes incorporated.

<?php 

# original function
function process_line($input) {

$reg_exUrl = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
if(preg_match($reg_exUrl, $input, $url)) {
for ($i = 0; $i < count($url); $i++) {
$input = preg_replace($reg_exUrl, "[" . $i . "]", $input);
}
}

return $input;

}

# function with fixes incorporated
function process_line1($input) {

$reg_exUrl = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
if(preg_match_all($reg_exUrl, $input, $url)) {
$url_matches = $url[0];
for ($i = 0; $i < count($url_matches); $i++) {
echo $i;
# add explicit limit of 1 to arguments of preg_replace
$input = preg_replace($reg_exUrl, "[" . $i . "]", $input, 1);
}
}

return $input;

}

$input = "test https://www.google.com/ mmh http://stackoverflow.com/";

$input = process_line1($input);

echo $input;

?>

Finding urls from text string via php and regex?

$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);

How to find URLs in a string that contains special chars using preg_match

Replace

$rule='@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@';

with

$rule='@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#!-]*(\?\S+)?[^\.\s])?)?)@';

It's Working...

php regex operation for url and # value detection

Check this

$reg_exUrl1 = "~(http|https|ftp|ftps)://[a-zA-Z0-9-.]+\.[a-zA-Z]{2,3}(/S*)?~";

$text = "The text you want to filter goes here. http://www.google.com data which in the http://www.apple.com";

if( preg_match($reg_exUrl1, $text, $url)){
echo preg_replace($reg_exUrl1, '<a href="$0" rel="nofollow">$0</a>', $text);
// if no URLs in the text just return the text

}else {
echo "IN Else #$".$text;
}

UPDATE

$reg_exUrl ="/#\w+/";
if( preg_match($reg_exUrl, $text, $url)){
echo preg_replace($reg_exUrl, '<a href="$0" rel="nofollow">$0</a>', $text);
}else {
echo "not Matched".$text;

Forcing a URL field to not have http://www. using regex

You can use this simple negative lookahead based regex in preg_match:

~^(?!https?://)(?!www\.).+$~i

RegEx Demo



Related Topics



Leave a reply



Submit