Regular Expression to Extract Url from an HTML Link

Regular expression to extract href url

I not regular developer of Swift, but, Did you tried to use the withTemplateoption of stringByReplacingMatches like this?

let regex = try! NSRegularExpression(pattern: "<a[^>]+href=\"(.*?)\"[^>]*>(.*)?</a>")
let range = NSMakeRange(0, text.characters.count)
let htmlLessString :String = regex.stringByReplacingMatches(in:
text,
options: [],
range:range ,
withTemplate: @"$2 ($1)")

Regex to extract URLs from href attribute in HTML with Python

import re

url = '<p>Hello World</p><a href="http://example.com">More Examples</a><a href="http://2.example">Even More Examples</a>'

urls = re.findall('https?://(?:[-\w.]|(?:%[\da-fA-F]{2}))+', url)

>>> print urls
['http://example.com', 'http://2.example']

Extracting for URL from string using regex

Your regex is incorrect.

Correct regex for extracting URl : /(https?:\/\/[^ ]*)/

Check out this fiddle.

Here is the snippet.

var urlRegex = /(https?:\/\/[^ ]*)/;
var input = "https://medium.com/aspen-ideas/there-s-no-blueprint-26f6a2fbb99c random stuff sd";var url = input.match(urlRegex)[1];alert(url);

Extracting URL link using regular expression re - string matching - Python

re.findall(r'https?://[^\s<>"]+|www\.[^\s<>"]+', str(STRING))

The [^\s<>"]+ part matches any non-whitespace, non quote, non anglebracket character to avoid matching strings like:

<a href="http://www.example.com/stuff">
http://www.example.com/stuff</br>

Get all links from html page using regex

You need to use a global modifier /g to get multiple matches with RegExp#exec.

Besides, since your input is HTML code, you need to make sure you do not grab < with \S:

/(?:ht|f)tps?:\/\/[-a-zA-Z0-9.]+\.[a-zA-Z]{2,3}(\/[^"<]*)?/g

See the regex demo.

If for some reason this pattern does not match equal signs, add it as an alternative:

/(?:ht|f)tps?:\/\/[-a-zA-Z0-9.]+\.[a-zA-Z]{2,3}(?:\/(?:[^"<=]|=)*)?/g

See another demo (however, the first one should do).

Using a regular expression to extract URLs from links in an HTML document

I would suggest using DOMDocument for this very purpose rather than using regex. Consider following simple code:

$content = '
<div class="infobar">
<a href="/link/some-text">link 1</a>
<a href="/link/another-text">link 2</a>
<a href="/link/blabla">link 3</a>
<a href="/link/whassup">link 4</a>
</div>';
$dom = new DOMDocument();
$dom->loadHTML($content);

// To hold all your links...
$links = array();

// Get all divs
$divs = $dom->getElementsByTagName("div");
foreach($divs as $div) {
// Check the class attr of each div
$cl = $div->getAttribute("class");
if ($cl == "infobar") {
// Find all hrefs and append it to our $links array
$hrefs = $div->getElementsByTagName("a");
foreach ($hrefs as $href)
$links[] = $href->getAttribute("href");
}
}
var_dump($links);

OUTPUT

array(4) {
[0]=>
string(15) "/link/some-text"
[1]=>
string(18) "/link/another-text"
[2]=>
string(12) "/link/blabla"
[3]=>
string(13) "/link/whassup"
}

Regex - extract href of html tag a

Consider using an HTML parser instead. Regex often isn't powerful enough to parse HTML. For the example you posted, and fairly limited variations of it, the following should work:

<a[\s\S]*?href="([^"]+)"[\s\S]*?>

Demo

regular expression for finding 'href' value of a a link

I'd recommend using an HTML parser over a regex, but still here's a regex that will create a capturing group over the value of the href attribute of each links. It will match whether double or single quotes are used.

<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1

You can view a full explanation of this regex at here.

Snippet playground:

const linkRx = /<a\s+(?:[^>]*?\s+)?href=(["'])(.*?)\1/;const textToMatchInput = document.querySelector('[name=textToMatch]');
document.querySelector('button').addEventListener('click', () => { console.log(textToMatchInput.value.match(linkRx));});
<label>  Text to match:  <input type="text" name="textToMatch" value='<a href="google.com"'>    <button>Match</button> </label>


Related Topics



Leave a reply



Submit