Detecting a url using preg_match? without http:// in the string
You want something like:
%^((https?://)|(www\.))([a-z0-9-].?)+(:[0-9]+)?(/.*)?$%i
this is using the | to match either http://
or www
at the beginning. I changed the delimiter to %
to avoid clashing with the |
Regular expression pattern to match URL with or without http://www
For matching all kinds of URLs, the following code should work:
<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP address
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>
Then, the correct way to check against the regex is as follows:
<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);
if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>
Courtesy: Comments made by splattermania in the PHP manual: preg_match
RegEx Demo in regex101
Php preg_match using URL as regex
preg_grep()
provides a shorter line of code, but because the substring to be matched doesn't appear to have any variable characters in it, best practice would indicate strpos()
is better suited.
Code: (Demo)
$urls=[
'http://www.example.com/eng-gb/products/test-1',
'http://www.example.com/eng-gb/badproducts/test-2',
'http://www.example.com/eng-gb/products/test-3',
'http://www.example.com/eng-gb/badproducts/products/test-4',
'http://www.example.com/products/test-5',
'http://www.example.com/eng-gb/about-us',
];
var_export(preg_grep('~^http://www.example\.com/eng-gb/products/[^/]*$~',$urls));
echo "\n\n";
var_export(array_filter($urls,function($v){return strpos($v,'http://www.example.com/eng-gb/products/')===0;}));
Output:
array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)
array (
0 => 'http://www.example.com/eng-gb/products/test-1',
2 => 'http://www.example.com/eng-gb/products/test-3',
)
Some notes:
Using preg_grep()
:
- Use a non-slash pattern delimiter so that you don't have to escape all of the slashes inside the pattern.
- Escape the dot at
.com
. - Write the full domain and directory path with start and end anchors for tightest validation.
- Use a negated character class near the end of the pattern to ensure that no additional directories are added (unless of course you wish to include all subdirectories).
- My pattern will match a url that ends with
/products/
but not/products
. This is in accordance with the details in your question.
Using strpos()
:
- Checking for
strpos()===0
means that the substring must be found at the start of the string. - This will allow any trailing characters at the end of the string.
preg_match_all - regex to find full urls in string
Well. Finally I got it:
The final regex code is:
$regex = "/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i";
It works.
preg_match not working when wanting to detect multiple urls
So, an even better solution here would be to split the incoming string into an array of strings between each url
segment and then insert [$i]
between consecutive non-url components.
# better solution, perform a split.
function process_line2($input) {
$regex_url = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
# split the incoming string into an array of non-url segments
# preg_split does not trim leading or trailing empty segments
$non_url_segments = preg_split($regex_url, $input, -1);
# inside the array, combine each successive non-url segment
# with the next index
$out = [];
$count = count($non_url_segments);
for ($i = 0; $i < $count; $i++) {
# add the segment
array_push($out, $non_url_segments[$i]);
# add its index surrounded by brackets on all segments but the last one
if ($i < $count -1) {
array_push($out, '[' . $i . ']');
}
}
# join strings with no whitespace
return implode('', $out);
}
preg_match
only returns the first result, so it doesn't give you the number of urls matching your regular expression. You need to extract the first element of the array returned by preg_match_all
.
The second error is that you are not using the limit
argument of preg_replace
, so all of your urls are getting replaced at the same time.
From the documentation for preg_replace
: http://php.net/manual/en/function.preg-replace.php
The parameters are
mixed preg_replace ( mixed $pattern , mixed $replacement , mixed $subject [, int $limit = -1 [, int &$count ]] )
in particular the limit parameter defaults to -1
(no limit)
limit: The maximum possible replacements for each pattern in each subject string. Defaults to -1 (no limit).
You need to set an explicit limit of 1.
Elaborating a bit on replacing preg_match
with preg_match_all
, you need to extract the [0] component from it since preg_match_all
returns an array of arrays. For example:
array(1) {
[0]=>
array(2) {
[0]=>
string(23) "https://www.google.com/"
[1]=>
string(25) "http://stackoverflow.com/"
}
}
Here is an example with the fixes incorporated.
<?php
# original function
function process_line($input) {
$reg_exUrl = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
if(preg_match($reg_exUrl, $input, $url)) {
for ($i = 0; $i < count($url); $i++) {
$input = preg_replace($reg_exUrl, "[" . $i . "]", $input);
}
}
return $input;
}
# function with fixes incorporated
function process_line1($input) {
$reg_exUrl = '/\b(?:(?:https?|ftp):\/\/|www\.)[-a-z0-9+&@#\/%?=~_|!:,.;]*[-a-z0-9+&@#\/%=~_|]/i';
if(preg_match_all($reg_exUrl, $input, $url)) {
$url_matches = $url[0];
for ($i = 0; $i < count($url_matches); $i++) {
echo $i;
# add explicit limit of 1 to arguments of preg_replace
$input = preg_replace($reg_exUrl, "[" . $i . "]", $input, 1);
}
}
return $input;
}
$input = "test https://www.google.com/ mmh http://stackoverflow.com/";
$input = process_line1($input);
echo $input;
?>
Finding urls from text string via php and regex?
$pattern = '#(www\.|https?://)?[a-z0-9]+\.[a-z0-9]{2,4}\S*#i';
preg_match_all($pattern, $str, $matches, PREG_PATTERN_ORDER);
How to find URLs in a string that contains special chars using preg_match
Replace
$rule='@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#-]*(\?\S+)?[^\.\s])?)?)@';
with
$rule='@(https?://([-\w\.]+[-\w])+(:\d+)?(/([\w/_\.#!-]*(\?\S+)?[^\.\s])?)?)@';
It's Working...
php regex operation for url and # value detection
Check this
$reg_exUrl1 = "~(http|https|ftp|ftps)://[a-zA-Z0-9-.]+\.[a-zA-Z]{2,3}(/S*)?~";
$text = "The text you want to filter goes here. http://www.google.com data which in the http://www.apple.com";
if( preg_match($reg_exUrl1, $text, $url)){
echo preg_replace($reg_exUrl1, '<a href="$0" rel="nofollow">$0</a>', $text);
// if no URLs in the text just return the text
}else {
echo "IN Else #$".$text;
}
UPDATE
$reg_exUrl ="/#\w+/";
if( preg_match($reg_exUrl, $text, $url)){
echo preg_replace($reg_exUrl, '<a href="$0" rel="nofollow">$0</a>', $text);
}else {
echo "not Matched".$text;
Forcing a URL field to not have http://www. using regex
You can use this simple negative lookahead based regex in preg_match
:
~^(?!https?://)(?!www\.).+$~i
RegEx Demo
Related Topics
Php: Locale Aware Number Format
Capture Newline from a Textarea Input
How to Join Two Tables with Ssp.Class.Php
3Rd Party Dependency Conflict in Developing Wordpress Plugin
Cross Domain Ajax Request with Jquery/Php
Convert Exponential Number to Decimal in PHP
Difference Between Break and Continue in PHP
Remove All Non-Numeric Characters from a String; [^0-9] Doesn't Match as Expected
How to Get Open Graph Protocol of a Webpage by PHP
Laravel 5.2 Cors, Get Not Working with Preflight Options
Array to Object and Object to Array in PHP - Interesting Behaviour
How to Obtain a Nested HTML List from Object's Array Recordset