Regular Expression for Finding 'Href' Value of a <A> Link

regular expression for finding 'href' value of a a link

I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href attribute of each links. It will match whether double or single quotes are used.

<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1

You can view a full explanation of this regex at here.

Snippet playground:

const linkRx = /<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1/;const textToMatchInput = document.querySelector('[name=textToMatch]');
document.querySelector('button').addEventListener('click', () => { console.log(textToMatchInput.value.match(linkRx));});
<label>  Text to match:  <input type="text" name="textToMatch" value='<a href="google.com"'>    <button>Match</button> </label>

Regular expression to extract href url

I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?

let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: @"$2 ($1)")

Regex - extract href of html tag a

Consider using an HTML parser instead. Regex often isn't powerful enough to parse HTML. For the example you posted, and fairly limited variations of it, the following should work:

<a[\s\S]*?href="([^"]+)"[\s\S]*?>

Demo

Regex to get href value of links that do not have rel='nofollow'

@revo is correct, this is not a job for regular expressions. Use a proper HTML parser to deconstruct the HTML, and then an XPath query to find the information you need.

$html = <<<HTML
<html>
<head>
<title>Example</title>
</head>
<body>
<a href='link1.php'>Link</a>
<a href="link's 2.php" class="link">Link2</a>
<a class="link" href='link3.php' rel='nofollow'>Link3</a>
<a href='link4.php'><span>Link4</span></a>
</body>
</html>
HTML;

$doc = new DOMDocument();
$valid = $doc->loadHTML($html);
$result = [];
if ($valid) {
$xpath = new DOMXpath($doc);
// find any <a> elements that do not have a rel="nofollow" attribute,
// then pick up their href attribute
$elements = $xpath->query("//a[not(@rel='nofollow')]/@href");
if (!is_null($elements)) {
foreach ($elements as $element) {
$result[] = $element->nodeValue;
}
}
}
print_r($result);
# => Array
# (
# [0] => link1.php
# [1] => link's 2.php
# [2] => link4.php
# )

JS Regex to find href of several a tags

You can try this regex:

/href="([^\'\"]+)/g

Example at: http://regexr.com?333d1

Update: or easier via non greedy method:

/href="(.*?)"/g

Regex, extract a href attribute from HTML with special name

Don't use regex here as it is not proper tool to handle nested structures (at last regex flavor used in Java since it doesn't support recursion) like HTML/XML

(more info: Can you provide some examples of why it is hard to parse XML and HTML with a regex?).

Proper tool is HTML/XML parser. I would probably choose jsoup because of its simplicity and CSS query support.

So your code could look like:

String html = "<a href=\"LINK_1\" class=\"am\"> Some Text</a>.. ANYTHING ..<a href=\"LINK_2\" class=\"am\"> Some Text</a><a href=\"SEARCHED_HREF_TO_EXTRACT\" class=\"am\"> SEARCHED_TEXT</a>";
Document doc = Jsoup.parse(html);
Elements links = doc.select("a:contains(SEARCHED_TEXT)"); //contains is case-insensitive
System.out.println(links.attr("href"));

or if you expect to find many links iterate over found Elements and get href attribute from each of them:

for(Element link : links){
System.out.println(link.attr("href"));
}

Regex to find Href value

if you use <a [^>]*href=(?:'(?<href>.*?)')|(?:"(?<href>.*?)") then the result will be stored in the named group href

Example:

var inputString="This is Test page <a href='test.aspx'>test page</a>";
var regex=new Regex("<a [^>]*href=(?:'(?<href>.*?)')|(?:\"(?<href>.*?)\")",RegexOptions.IgnoreCase);
var urls=regex.Matches(inputString).OfType<Match>().Select(m =>m.Groups["href"].Value);

urls will be a collection of strings containing the hrefs.

How get links from href property using regex

RegExp#exec will store all contents captured by the capturing groups that are defined in your pattern. You may access Group 1 with [1] index.

Use

var token = matchArray[1];

Also, I believe you can shorten the regex to just

/\bhref="((?:http|ftp)[^"]+)"/g

if you are sure the values are always inside double quotes. See this demo.



Related Topics



Leave a reply



Submit