Regular Expression Pattern to Match Url With or Without Http://Www

Regular expression pattern to match URL with or without http://www

For matching all kinds of URLs, the following code should work:

<?php
$regex = "((https?|ftp)://)?"; // SCHEME
$regex .= "([a-z0-9+!*(),;?&=$_.-]+(:[a-z0-9+!*(),;?&=$_.-]+)?@)?"; // User and Pass
$regex .= "([a-z0-9\-\.]*)\.(([a-z]{2,4})|([0-9]{1,3}\.([0-9]{1,3})\.([0-9]{1,3})))"; // Host or IP
$regex .= "(:[0-9]{2,5})?"; // Port
$regex .= "(/([a-z0-9+$_%-]\.?)+)*/?"; // Path
$regex .= "(\?[a-z+&\$_.-][a-z0-9;:@&%=+/$_.-]*)?"; // GET Query
$regex .= "(#[a-z_.-][a-z0-9+$%_.-]*)?"; // Anchor
?>

Then, the correct way to check against the regex is as follows:

<?php
if(preg_match("~^$regex$~i", 'www.example.com/etcetc', $m))
var_dump($m);

if(preg_match("~^$regex$~i", 'http://www.example.com/etcetc', $m))
var_dump($m);
?>

Courtesy: Comments made by splattermania in the PHP manual: http://php.net/manual/en/function.preg-match.php

RegEx Demo in regex101

url regex without http://www

It would probably be a good idea to keep the www's intact in order to preserve the sub-domain. A regex pattern like this:

^([a-zA-Z0-9]+(\.[a-zA-Z0-9]+)+.*)$

would match a URL that is not prefixed by a protocol (http://,https://,ftp://,etc).

Regular expression pattern to match url with or without http(s) and without tags

Nevermind, found the solution already.

With that solution, everything works fine

$pattern = '(?xi)\b((?:https?://|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`\!()\[\]{};:\'".,<>?«»“”‘’]))';     
return preg_replace("!$pattern!i", "<a href=\"\\0\" rel=\"nofollow\" target=\"_blank\">\\0</a>", $str);

regex for url without protocol

try using this pattern

^(?!https?).*$  

with i modifier for case insensitive.

Demo


Per comment below use this pattern

^(?!https?)www\..*$  

or simply

^www\..*$

Regex for website or url validation

Use the regex ^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+\.[a-z]+(\/[a-zA-Z0-9#]+\/?)*$

This is a basic one I build just now. A google search can give you more.

Here

  • ^ Should start with
  • ((https?|ftp|smtp)://)? may or maynot contain any of these protocols
  • (www.)? may or may not have www.
  • [a-z0-9]+(.[a-z]+) url and domain and also subdomain if any upto 2 levels
  • (/[a-zA-Z0-9#]+/?)*/? can contain path to files but not necessary. last may contain a /
  • $ should end there





var a=["http://www.sample.com","https://www.sample.com/","https://www.sample.com#","http://www.sample.com/xyz","http://www.sample.com/#xyz","www.sample.com","www.sample.com/xyz/#/xyz","sample.com","sample.com?name=foo","http://www.sample.com#xyz","http://www.sample.c"];

var re=/^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+(\.[a-z]{2,}){1,3}(#?\/?[a-zA-Z0-9#]+)*\/?(\?[a-zA-Z0-9-_]+=[a-zA-Z0-9-%]+&?)?$/;

a.map(x=>console.log(x+" => "+re.test(x)));

URL validation with or without http and www

The ^ anchor in ^(www | www\.) exclude the lines containing https? at the front. You can use non-capture groups and ? to find those situations optionally:

^(?:https?://|s?ftps?://)?(?!www | www\.)[A-Za-z0-9_-]+\.+[A-Za-z0-9.\/%&=\?_:;-]+$

That will match all but the last two lines of:

yahoo.com
www.yahoo.com
http://www.yahoo.com
yahoo.org
somesite.org
www.somesite.org
http://somesite.org
https://www.somesite.org
somesite.org?case=1

www somesite.org // Space after 'www'
www.example.com // Leading space

regex101 example

Only match URL beginning with 'www' or 'http(s)://' and nothing else

This regexp is definitelly not perfect but will do what you want:

(http[s]?:\/\/|www.|ftp:\/\/){1,2}([-a-zA-Z0-9_]{2,256}\.[a-z]{2,4}\b(\/?[-a-zA-Z0-9@:%_\+.~#?&\/=]+)?)

It can be tricked to match non-urls, but this can't be abused. Increasing smartness greatly increases complexity.



Related Topics



Leave a reply



Submit