What is a good regex to match for URLs in dataweave?

I updated the regex you have shared and replaced match with matches as you would like to validate the url against the regex.

%dw 2.0
var myString = "https://www.mycompany.com"
output application/json
"match" : myString matches (/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()\/@:%_\+.~#?&=]*)/)

Regex for website or url validation

Use the regex ^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+\.[a-z]+(\/[a-zA-Z0-9#]+\/?)*$

This is a basic one I build just now. A google search can give you more.


  • ^ Should start with
  • ((https?|ftp|smtp)://)? may or maynot contain any of these protocols
  • (www.)? may or may not have www.
  • [a-z0-9]+(.[a-z]+) url and domain and also subdomain if any upto 2 levels
  • (/[a-zA-Z0-9#]+/?)*/? can contain path to files but not necessary. last may contain a /
  • $ should end there

var a=["http://www.sample.com","https://www.sample.com/","https://www.sample.com#","http://www.sample.com/xyz","http://www.sample.com/#xyz","www.sample.com","www.sample.com/xyz/#/xyz","sample.com","sample.com?name=foo","http://www.sample.com#xyz","http://www.sample.c"];

var re=/^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+(\.[a-z]{2,}){1,3}(#?\/?[a-zA-Z0-9#]+)*\/?(\?[a-zA-Z0-9-_]+=[a-zA-Z0-9-%]+&?)?$/;

a.map(x=>console.log(x+" => "+re.test(x)));

Regular expression to match URLs in Java

Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.

String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";

This works too:

String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";


String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>

String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>

Regex to find a valid URL in a email body regardless of newlines dividing it, and it need to contains '?' character

Components of a URI

\_/ \______________/\_________/\__________/ \__/
| | | | |
scheme authority path query fragment


The scheme of a URL is the first item, such as http, which indicates that this URI uses the hyper-text transport protocol. Examples of other schemes are:

Sample Image


In a URL the authority is also called the domain and may include a port number at the end separated by a colon.

In the following example, the authority is www.cambiaresearch.com


In the following example, the authority is www.cambiaresearch.com:81


In the following example, the authority is info@cambiaresearch.com



The path component of the URL specifies the specific file (or page) at a particular domain. The path is terminated by the end of the URL, a question mark (?) which signifies the beginning of the query string or the number sign (#) which signifies the beginning of the fragment.

The path of the following URL is "/default.htm"


The path of the following URL is "/snippets/csharp/regex/uri_regex.aspx"



The query part of the URL is a way to send some information to the path or webpage that will handle the web request. The query begins with a question mark (?) and is terminated by the end of the URL or a number sign (#) which signifies the beginning of the fragment.

The query of the following URL is "?id=241"


The query of the following URL is "?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC: 1969-53,GGLC:en&q=uri+query"



In a URL the fragment is used to specify a location within the current page. This is often used in a FAQ with a list of links at the top of the page linking to longer descriptions farther down in the page.

The fragment of the following URL is "contact"


The fragment of the following URL is "scheme"


Example: Regular Expressions for Parsing URIs and URLs

Simple way using [?] regex pattern:

public bool RegexUrlWithQuestionChar(string url)
string pattern = @"(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?"; //Url pattern

var regex = new Regex(pattern);
var math = regex.Match(url);

return new Regex("[?]").IsMatch(math.Value); //Find ?

MessageBox.Show("Found"); // This show
MessageBox.Show("Not found");

MessageBox.Show("Not found"); // This show





How to write custom regular expression for url(custom url)

Your pattern has no anchors, and the initial subpattern is a character class [http://a-zA-Z0-9]{1,20} that matches 1 to 20 chars from the class, either h or t, p, :, /, a-z, A-Z, 0-9 while you need to match http:// as a sequence.

I suggest


See the regex demo


  • ^ - start of string
  • (https?:\/\/)? - an optional sequence of http:// or https://
  • [a-zA-Z] - an ASCII letter
  • [a-zA-Z0-9]{0,19} - 0 to 19 alphanumeric characters (the length restriction can be adjusted by you)
  • \.constant\. - a constant substring .constant.
  • [a-zA-Z]{1,5} - 1 to 5 ASCII letters
  • $ - end of string.

Regex to match URL

$search  = "#^((?#
the scheme:
second level domains and beyond:
top level domains:
the path, can be there or not:

Just cleaned up a bit. This will match only HTTP(s) addresses, and, as long as you copied all top level domains correctly from IANA, only those standardized (it will not match http://localhost) and with the http:// declared.

Finally you should end with the path part, that will always start with a /, if it is there.

However, I'd suggest to follow Cerebrus: If you're not sure about this, learn regexps in a more gentle way and use proven patterns for complicated tasks.


By the way: Your regexp will also match something.r and something.h (between |TO| and |TR| in your example). I left them out in my version, as I guess it was a typo.

On re-reading the question: Change




(there is a ? extra) to match 'URLs' without the scheme.

