Regex for Dropping Http:// and Www. from Urls

Regex for dropping http:// and www. from URLs

preg_replace can also take an array, so you don't even need the loop. You can do this with a one liner:

$urls = preg_replace('/(?:https?:\/\/)?(?:www\.)?(.*)\/?$/i', '$1', $urls);

Regex to remove scheme (http:) for a URL

You could do this with basic string manipulation, and I would recommend it over regex.

However if you insist on using regex, here's a regex which would do this if combined with regex replace of whichever language you are using:

^http:
^\ /
| \ /
| `- Match this string literally
|
`- Match at start of string

If you're also going to remove https: it would look like this:

^https?:
^\ /^^^
| \/ |||
| | ||`- Literally match `:`
| | |`- Previous is optional (literal s)
| | `- Literally match s
| `- Match this string literally
|
`- Match at start of string

These all assume you are only checking exact URLs, if you'd like to check anywhere in a string, you could replace the ^ anchor (beginnning of string) with \b which is for word boundary:

\bhttps?:
\/\ /^^^
| \/ |||
| | ||`- Literally match `:`
| | |`- Previous is optional (literal s)
| | `- Literally match s
| `- Match this string literally
|
`- Word boundary (typically whitespace, but also `][` and so on

Make the regex replace everything that matches that pattern with '' (empty string). I recommend adding a i flag for case-insensitive matching.

Here's a good tutorial site on regular expressions: http://www.regular-expressions.info/

Regex remove www from URL

You can add a (?!www\.) and (?!http:\/\/www\.) negative lookaheads right after the first \b to exclude matching www. or http://www.:

\b(?!www\.)(?!http:\/\/www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)
^^^^^^^^^^^^^^^^^^^^^^^^^^^

See the regex demo

You may add more negative lookaheads to exclude https:// or ftp/ftps links.

ALTERNATIVE:

\b(?!(?:https?|ftps?):\/\/)(?!www\.)(?:[0-9A-Za-z][0-9A-Za-z-]{0,62})(?:\.(?:[0-9A-Za-z][0-9A-Za-z-]{0,62}))*(?:\.?|\b)

See this regex demo

The (?!(?:https?|ftps?):\/\/) and (?!www\.) lookaheads will just let you skip the protocol and www parts of the URLs.

How to remove any URL within a string in Python

Python script:

import re
text = re.sub(r'^https?:\/\/.*[\r\n]*', '', text, flags=re.MULTILINE)

Output:

text1
text2
text3
text4
text5
text6

Test this code here.

Add regex rule to remove www. on existing expression

You can match optional www. after http://:

var matches = url.match(/^https?\:\/\/(?:www\.)?([^\/?#]+)(?:[\/?#]|$)/i);
//=> ["http://www.stackoverflow.com/", "stackoverflow.com"]

Remove part of a URL string in R

Considering the movie id as the only part with digits, you can remove any other characters that are not digits, leaving you with the ids as follows:

> gsub("[^[:digit:]]", "", movie.link)
[1] "0451279" "2345759" "1790809" "1469304" "0974015" "3896198" "3371366" "3890160" "3315342" "4425200"
[11] "2250912" "2406566" "1972591" "1825683" "2091256" "3501632" "4630562" "1386697" "4154756" "4116284"
[21] "2975590" "5884234" "5013056" "1211837" "0120616" "2527336" "1082807" "0325980" "1293847" "2034800"
[31] "2015381" "2911666" "1648190" "4912910" "1298650" "1477834" "2334871" "3748528" "2239822" "3469046"
[41] "2461150" "3731562" "1431045" "0449088" "3385516" "2226597" "0468569" "1219827" "0383574" "3498820"

Remove http:// and www from form field while customer is typing, and show alert?

Per your code, user cannot type special characters like :// but user can paste it. To handle such cases, you can validate it on blur event. Following is the fiddle depicting same. Also I have added a simple check for"http", and will show error if http is entered. You can configure per your requirement.

Code

(function() {  var regex = new RegExp("^[a-zA-Z0-9\-_.]+$");  $('#domain').keypress(function(e) {    var str = String.fromCharCode(!e.charCode ? e.which : e.charCode);    if (regex.test(str)) {      return true;    }
$("#lblError").text("Please use only a-z, 0-9 and dots.").fadeIn();
e.preventDefault(); return false; });
$("#domain").on("blur", function(e) { var str = $(this).val(); if (regex.test(str)) { if (str.indexOf("http") >= 0) { $("#lblError").text("Domain name cannot have HTTP in it.").fadeIn(); return false; } $("#lblError").fadeOut(); } else { $("#lblError").text("Please use only a-z, 0-9 and dots.").fadeIn(); return false } });})()
.error {  color: red;  display: none;}
<script src="https://ajax.googleapis.com/ajax/libs/jquery/1.11.1/jquery.min.js"></script><form>  <input placeholder="input your domain (WITHOUT http:// and www.)" class="form-control" name="domain" type="text" autocomplete="off" id="domain" style="max-width:320px">  <p class="error" id="lblError"></p></form>

Remove http:// and https:// from a string

server = server.(/^https?\:\/\/(www.)?/,'')

This didn't work, because you aren't calling a method of the string server. Make sure you call the sub method:

server = server.sub(/^https?\:\/\/(www.)?/,'')

Example

> server = "http://www.stackoverflow.com"
> server = server.sub(/^https?\:\/\/(www.)?/,'')
stackoverflow.com

As per the requirement if you want it to work with the illegal format http:\\ as well, use the following regex:

server.sub(/https?\:(\\\\|\/\/)(www.)?/,'')

Regex in R: remove multiple URLs from string

This one will also work, instead of (.*) we can use [^\\.]* (till the dot of the domain) and \\S* to match till the end of the url (until a space is found):

gsub("\\s?(f|ht)(tp)(s?)(://)([^\\.]*)[\\.|/](\\S*)", "", string)
# [1] "this is a URL and another one and this one"


Related Topics



Leave a reply



Submit