How to Replace Text Urls and Exclude Urls in HTML Tags

Replacing link with additional information, how to exclude img src= http:// tags

Just try the below code, and I am sure you get the desired output :

require 'uri'
text = '<p>This is a link: http://www.url1.com/</p>
<p>http://www.url2.com/</p>
<p><img src="http://www.url3.com/image.jpg"> something</p>'
URI.extract(text)

links => ["link:", "http://www.url1.com/", "http://www.url2.com/", "http://www.url3.com/image.jpg"]

And then replace all the links with the 'REPLACED' using gsub .

links.shift => "link :"
links.each do |link|
text = text.gusb(link, "REPLACED")
end

and output of text is

"<p>This is a link : REPLACED</p>\n<p>REPLACED</p>\n<p><img src=\"REPLACED\"> something</p>"

Hope that help.

Regex replace hyphens in text excluding urls, tags and mails

There's a few issues with your regex...

  • You can't use | as an OR operator in your character class
  • Your regex is greedy
  • You can't use multiple not operators in a character class
  • You don't need to have more than one space matched at the start and end
  • Your character class swallows spaces

It seems to me that you're over thinking it; you could rephrase your task to: "replace hyphens in words"

(\s\w+)-(\w+\s)
(\s\w+) : Capture group matching a white space and then 1 or more of the characters [a-zA-Z0-9_]
- : Match a hyphen
(\w+\s) : Capture group matching a white space and then 1 or more of the characters [a-zA-Z0-9_]

However, you can also use your more wide ranging character class like:

(\s[^@\/\s]+)-([^@\/\s]+\s)
(\s[^@\/\s]+) : Capture group matching a space followed by 1 or more characters which aren't @, /, or a space
- : Matches a hyphen
([^@\/\s]+\s) : Capture group matching a space followed by 1 or more characters which aren't @, /, or a space

$string = "Some text with a link but also plain URL like http://another-domain.com and an e-mail info@some-domain.com and e-shop and some relative URL like /test-url/on-this-website.";

echo preg_replace("/(\s\w+)-(\w+\s)/", "$1‑$2", $string);

echo preg_replace("/(\s[^@\/\s]+)-([^@\/\s]+\s)/", "$1‑$2", $string);

Note: You may need to change the starting and closing space to include the start/end of a the string.

Javascript regex: Find all URLs outside a tags - Nested Tags

It turned out that probably the best solution is the following:

((https?|ftps?):\/\/[^"<\s]+)(?![^<>]*>|[^"]*?<\/a)

Looks like that the negative lookahead is working properly only if it starts with quantifiers and not strings. For such a case, it follows that practically we can do backtracks only.

Again, we just want to make sure that nothing inside HTML tags as attributes is messed up. Then we do a backtrack starting from </a up to the first " symbol (as it is not a valid URL symbol but <> symbols are present with nested tags).

Now also nested tags inside <a> tags are found properly. Of course, the code is not perfect but it should work with almost any simple HTML markup. Just you may need to be a bit careful with:

  • placing quotes within <a> tags;
  • do not use this algorithm on <a> tags without any attribute (placeholders);
  • as well as you may need to avoid using multiple nested tags/lines unless the URL inside <a> tag is after any double quote.



Here is a very good and messy example (the last match should not be found but it is):

https://regex101.com/r/pC0jR7/2

It is a pity that this lookahead does not work: (?!<a.*?<\/a>)

jQuery using Regex to find links within text but exclude if the link is in quotes

What about adding [^"'] to the exp variable?

var exp = /(\b[^"'](https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;

Snippet:





// Get the content

var str = jQuery("#text2replace").html();


// Set the regex string

var exp = /(\b[^"'](https?|ftp|file):\/\/[-A-Z0-9+&@#\/%?=~_|!:,.;]*[-A-Z0-9+&@#\/%=~_|])/ig;


var replaced_text = str.replace(exp, function(url) {

clean_url = url.replace(/https?:\/\//gi,'');

return '<a href="' + url + '">' + clean_url + '</a>';

})


jQuery("#text2replace").html(replaced_text);
<script src="https://ajax.googleapis.com/ajax/libs/jquery/2.1.1/jquery.min.js"></script>


<div id="text2replace">

The School of Computer Science and Informatics. She blogs at http://www.wordpress.com and can be found on Twitter <a href="https://twitter.com/abcdef">@Abcdef</a>.

</div>


Related Topics



Leave a reply



Submit