regular expression for finding 'href' value of a a link
I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href
attribute of each links. It will match whether double or single quotes are used.
<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1
You can view a full explanation of this regex at here.
Snippet playground:
const linkRx = /<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1/;const textToMatchInput = document.querySelector('[name=textToMatch]');
document.querySelector('button').addEventListener('click', () => { console.log(textToMatchInput.value.match(linkRx));});
<label> Text to match: <input type="text" name="textToMatch" value='<a href="google.com"'> <button>Match</button> </label>
Regular expression to extract href url
I not regular developer of Swift, but, Did you tried to use the withTemplate
option of stringByReplacingMatches
like this?
let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: @"$2 ($1)")
Regex - extract href of html tag a
Consider using an HTML parser instead. Regex often isn't powerful enough to parse HTML. For the example you posted, and fairly limited variations of it, the following should work:
<a[\s\S]*?href="([^"]+)"[\s\S]*?>
Demo
Regex to get href value of links that do not have rel='nofollow'
@revo is correct, this is not a job for regular expressions. Use a proper HTML parser to deconstruct the HTML, and then an XPath query to find the information you need.
$html = <<<HTML
<html>
<head>
<title>Example</title>
</head>
<body>
<a href='link1.php'>Link</a>
<a href="link's 2.php" class="link">Link2</a>
<a class="link" href='link3.php' rel='nofollow'>Link3</a>
<a href='link4.php'><span>Link4</span></a>
</body>
</html>
HTML;
$doc = new DOMDocument();
$valid = $doc->loadHTML($html);
$result = [];
if ($valid) {
$xpath = new DOMXpath($doc);
// find any <a> elements that do not have a rel="nofollow" attribute,
// then pick up their href attribute
$elements = $xpath->query("//a[not(@rel='nofollow')]/@href");
if (!is_null($elements)) {
foreach ($elements as $element) {
$result[] = $element->nodeValue;
}
}
}
print_r($result);
# => Array
# (
# [0] => link1.php
# [1] => link's 2.php
# [2] => link4.php
# )
JS Regex to find href of several a tags
You can try this regex:
/href="([^\'\"]+)/g
Example at: http://regexr.com?333d1
Update: or easier via non greedy method:
/href="(.*?)"/g
Regex, extract a href attribute from HTML with special name
Don't use regex here as it is not proper tool to handle nested structures (at last regex flavor used in Java since it doesn't support recursion) like HTML/XML
(more info: Can you provide some examples of why it is hard to parse XML and HTML with a regex?).
Proper tool is HTML/XML parser. I would probably choose jsoup because of its simplicity and CSS query support.
So your code could look like:
String html = "<a href=\"LINK_1\" class=\"am\"> Some Text</a>.. ANYTHING ..<a href=\"LINK_2\" class=\"am\"> Some Text</a><a href=\"SEARCHED_HREF_TO_EXTRACT\" class=\"am\"> SEARCHED_TEXT</a>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("a:contains(SEARCHED_TEXT)"); //contains is case-insensitive
System.out.println(links.attr("href"));
or if you expect to find many links iterate over found Elements and get href
attribute from each of them:
for(Element link : links){
System.out.println(link.attr("href"));
}
Regex to find Href value
if you use <a [^>]*href=(?:'(?<href>.*?)')|(?:"(?<href>.*?)")
then the result will be stored in the named group href
Example:
var inputString="This is Test page <a href='test.aspx'>test page</a>";
var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);
var urls=regex.Matches(inputString).OfType<Match>().Select(m =>m.Groups["href"].Value);
urls will be a collection of strings containing the hrefs.
How get links from href property using regex
RegExp#exec
will store all contents captured by the capturing groups that are defined in your pattern. You may access Group 1 with [1]
index.
Use
var token = matchArray[1];
Also, I believe you can shorten the regex to just
/\bhref="((?:http|ftp)[^"]+)"/g
if you are sure the values are always inside double quotes. See this demo.
Related Topics
How to Set Custom JSONserializersettings for JSON.Net in ASP.NET Web API
In a "Using" Block Is a SQLconnection Closed on Return or Exception
Accessing a Variable from Another Script C#
#If Debug VS. Conditional("Debug")
When to Use Ref and When It Is Not Necessary in C#
C# - Multiple Generic Types in One List
All Possible Combinations of a List of Values
How to Use Moq to Mock an Extension Method
Why Isn't There Generic Variance for Classes in C# 4.0
Reading 64Bit Registry from a 32Bit Application
Differencebetween a Shared Project and a Class Library in Visual Studio 2015
How to Rethrow Innerexception Without Losing Stack Trace in C#