What is a good regex to match for URLs in dataweave?
I updated the regex you have shared and replaced match with matches as you would like to validate the url against the regex.
%dw 2.0
var myString = "https://www.mycompany.com"
output application/json
---
{
"match" : myString matches (/https?:\/\/(www\.)?[-a-zA-Z0-9@:%._\+~#=]{1,256}\.[a-zA-Z0-9()]{1,6}\b([-a-zA-Z0-9()\/@:%_\+.~#?&=]*)/)
}
Regex for website or url validation
Use the regex ^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+\.[a-z]+(\/[a-zA-Z0-9#]+\/?)*$
This is a basic one I build just now. A google search can give you more.
Here
- ^ Should start with
- ((https?|ftp|smtp)://)? may or maynot contain any of these protocols
- (www.)? may or may not have www.
- [a-z0-9]+(.[a-z]+) url and domain and also subdomain if any upto 2 levels
- (/[a-zA-Z0-9#]+/?)*/? can contain path to files but not necessary. last may contain a
/
- $ should end there
var a=["http://www.sample.com","https://www.sample.com/","https://www.sample.com#","http://www.sample.com/xyz","http://www.sample.com/#xyz","www.sample.com","www.sample.com/xyz/#/xyz","sample.com","sample.com?name=foo","http://www.sample.com#xyz","http://www.sample.c"];
var re=/^((https?|ftp|smtp):\/\/)?(www.)?[a-z0-9]+(\.[a-z]{2,}){1,3}(#?\/?[a-zA-Z0-9#]+)*\/?(\?[a-zA-Z0-9-_]+=[a-zA-Z0-9-%]+&?)?$/;
a.map(x=>console.log(x+" => "+re.test(x)));
Regular expression to match URLs in Java
Try the following regex string instead. Your test was probably done in a case-sensitive manner. I have added the lowercase alphas as well as a proper string beginning placeholder.
String regex = "^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
This works too:
String regex = "\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]";
Note:
String regex = "<\\b(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // matches <http://google.com>
String regex = "<^(https?|ftp|file)://[-a-zA-Z0-9+&@#/%?=~_|!:,.;]*[-a-zA-Z0-9+&@#/%=~_|]>"; // does not match <http://google.com>
Regex to find a valid URL in a email body regardless of newlines dividing it, and it need to contains '?' character
Components of a URI
foo://example.com:8042/over/there?name=ferret#nose
\_/ \______________/\_________/\__________/ \__/
| | | | |
scheme authority path query fragment
Scheme
The scheme of a URL is the first item, such as http, which indicates that this URI uses the hyper-text transport protocol. Examples of other schemes are:
Authority
In a URL the authority is also called the domain and may include a port number at the end separated by a colon.
In the following example, the authority is www.cambiaresearch.com
*
http://www.cambiaresearch.com
In the following example, the authority is www.cambiaresearch.com:81
https://www.cambiaresearch.com:81
In the following example, the authority is info@cambiaresearch.com
mailto:info@cambiaresearch.com
Path
The path component of the URL specifies the specific file (or page) at a particular domain. The path is terminated by the end of the URL, a question mark (?) which signifies the beginning of the query string or the number sign (#) which signifies the beginning of the fragment.
The path of the following URL is "/default.htm"
http://www.cambiaresearch.com/default.htm
The path of the following URL is "/snippets/csharp/regex/uri_regex.aspx"
http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx
Query
The query part of the URL is a way to send some information to the path or webpage that will handle the web request. The query begins with a question mark (?) and is terminated by the end of the URL or a number sign (#) which signifies the beginning of the fragment.
The query of the following URL is "?id=241"
http://www.cambiaresearch.com/default.htm?id=241
The query of the following URL is "?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC: 1969-53,GGLC:en&q=uri+query"
http://www.google.com/search?sourceid=navclient&ie=UTF-8&rls=GGLC,GGLC:1969-53,GGLC:en&q=uri+query
Fragment
In a URL the fragment is used to specify a location within the current page. This is often used in a FAQ with a list of links at the top of the page linking to longer descriptions farther down in the page.
The fragment of the following URL is "contact"
http://www.cambiaresearch.com/default.htm#contact
The fragment of the following URL is "scheme"
http://www.cambiaresearch.com/snippets/csharp/regex/uri_regex.aspx#scheme
Example: Regular Expressions for Parsing URIs and URLs
Simple way using [?]
regex pattern:
public bool RegexUrlWithQuestionChar(string url)
{
string pattern = @"(http(s)?://)?([\w-]+\.)+[\w-]+(/[\w- ;,./?%&=]*)?"; //Url pattern
var regex = new Regex(pattern);
var math = regex.Match(url);
return new Regex("[?]").IsMatch(math.Value); //Find ?
}
if(RegexUrlWithQuestionChar("www.example.com.br/area?key=235fksf&rec=fsjgsg"))
{
MessageBox.Show("Found"); // This show
}
else
{
MessageBox.Show("Not found");
}
if(RegexUrlWithQuestionChar("www.example.com.br/area"))
{
MessageBox.Show("Found");
}
else
{
MessageBox.Show("Not found"); // This show
}
Credits:
urlregex.com
parsing-urls-with-regular-expressions-and-the-regex-object
www.dotnetperls.com/regex
How to write custom regular expression for url(custom url)
Your pattern has no anchors, and the initial subpattern is a character class [http://a-zA-Z0-9]{1,20}
that matches 1 to 20 chars from the class, either h
or t
, p
, :
, /
, a-z
, A-Z
, 0-9
while you need to match http://
as a sequence.
I suggest
^(https?:\/\/)?[a-zA-Z][a-zA-Z0-9]{0,19}\.constant\.[a-zA-Z]{1,5}$
See the regex demo
Explanation:
^
- start of string(https?:\/\/)?
- an optional sequence ofhttp://
orhttps://
[a-zA-Z]
- an ASCII letter[a-zA-Z0-9]{0,19}
- 0 to 19 alphanumeric characters (the length restriction can be adjusted by you)\.constant\.
- a constant substring.constant.
[a-zA-Z]{1,5}
- 1 to 5 ASCII letters$
- end of string.
Regex to match URL
$search = "#^((?#
the scheme:
)(?:https?://)(?#
second level domains and beyond:
)(?:[\S]+\.)+((?#
top level domains:
)MUSEUM|TRAVEL|AERO|ARPA|ASIA|EDU|GOV|MIL|MOBI|(?#
)COOP|INFO|NAME|BIZ|CAT|COM|INT|JOBS|NET|ORG|PRO|TEL|(?#
)A[CDEFGILMNOQRSTUWXZ]|B[ABDEFGHIJLMNORSTVWYZ]|(?#
)C[ACDFGHIKLMNORUVXYZ]|D[EJKMOZ]|(?#
)E[CEGHRSTU]|F[IJKMOR]|G[ABDEFGHILMNPQRSTUWY]|(?#
)H[KMNRTU]|I[DELMNOQRST]|J[EMOP]|(?#
)K[EGHIMNPRWYZ]|L[ABCIKRSTUVY]|M[ACDEFGHKLMNOPQRSTUVWXYZ]|(?#
)N[ACEFGILOPRUZ]|OM|P[AEFGHKLMNRSTWY]|QA|R[EOSUW]|(?#
)S[ABCDEGHIJKLMNORTUVYZ]|T[CDFGHJKLMNOPRTVWZ]|(?#
)U[AGKMSYZ]|V[ACEGINU]|W[FS]|Y[ETU]|Z[AMW])(?#
the path, can be there or not:
)(/[a-z0-9\._/~%\-\+&\#\?!=\(\)@]*)?)$#i";
Just cleaned up a bit. This will match only HTTP(s) addresses, and, as long as you copied all top level domains correctly from IANA, only those standardized (it will not match http://localhost
) and with the http://
declared.
Finally you should end with the path part, that will always start with a /, if it is there.
However, I'd suggest to follow Cerebrus: If you're not sure about this, learn regexps in a more gentle way and use proven patterns for complicated tasks.
Cheers,
By the way: Your regexp will also match something.r
and something.h
(between |TO| and |TR| in your example). I left them out in my version, as I guess it was a typo.
On re-reading the question: Change
)(?:https?://)(?#
to
)(?:https?://)?(?#
(there is a ?
extra) to match 'URLs' without the scheme.
Related Topics
Determine Whether an Array Contains a Value
Difference Between ( For... in ) and ( For... of ) Statements
In JavaScript, Does It Make a Difference If I Call a Function With Parentheses
How to Identify If a Webpage Is Being Loaded Inside an Iframe or Directly into the Browser Window
Using Jquery $(This) With Es6 Arrow Functions (Lexical This Binding)
How to Get a Key in a JavaScript Object by Its Value
How to Interpolate Variables in Strings in JavaScript, Without Concatenation
What Is the Purpose of a Self Executing Function in JavaScript
How to Access Post Form Fields in Express
Can Someone Explain the "Debounce" Function in JavaScript
Best Way to Manage Chat Channels in Firebase
Modifying a Copy of a JavaScript Object Is Causing the Original Object to Change
How to Find Out the Caller Function in JavaScript
Is It Right to Think of a JavaScript Function Expression That Uses the 'New' Keyword as 'Static'
JavaScript Window Resize Event
Browser Detection in JavaScript