How to Mimic Stack Overflow Auto-Link Behavior

How to mimic Stack Overflow Auto-Link Behavior

Try this out. The URL-matching regex pattern is from Daring Fireball.

/**
* Replace links in text with html links
*
* @param string $text
* @return string
*/
function auto_link_text($text)
{
// a more readably-formatted version of the pattern is on http://daringfireball.net/2010/07/improved_regex_for_matching_urls
$pattern = '(?i)\b((?:[a-z][\w-]+:(?:/{1,3}|[a-z0-9%])|www\d{0,3}[.]|[a-z0-9.\-]+[.][a-z]{2,4}/)(?:[^\s()<>]+|\(([^\s()<>]+|(\([^\s()<>]+\)))*\))+(?:\(([^\s()<>]+|(\([^\s()<>]+\)))*\)|[^\s`!()\[\]{};:\'".,<>?«»“”‘’]))';

$callback = create_function('$matches', '
$url = array_shift($matches);
$url_parts = parse_url($url);

$text = parse_url($url, PHP_URL_HOST) . parse_url($url, PHP_URL_PATH);
$text = preg_replace("/^www./", "", $text);

$last = -(strlen(strrchr($text, "/"))) + 1;
if ($last < 0) {
$text = substr($text, 0, $last) . "…";
}

return sprintf(\'<a rel="nofollow" href="%s">%s</a>\', $url, $text);
');

return preg_replace_callback($pattern, $callback, $text);
}

Input Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior

Also, base_convert php function?
http://pt.php.net/manual/en/function.base-convert.php#52450

http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450

Output Text:

This is my text.  I wonder if you know about asking questions on StackOverflow:
Check This out <a rel="nofollow" href="http://www.stackoverflow.com/questions/1925455/how-to-mimic-stackoverflow-auto-link-behavior">stackoverflow.com/questions/1925455/…</a>

Also, base_convert php function?
<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php#52450">pt.php.net/manual/en/…</a>

<a rel="nofollow" href="http://pt.php.net/manual/en/function.base-convert.php?wtf=hehe#52450">pt.php.net/manual/en/…</a>

PHP autolink if not already linked

Load up the string as HTML in ta DOM parser, iterate over the text nodes, and check for a URL. Make sure the text node's parent isn't an <a> tag, so you know that the text you're getting is not already in a link. Now, find all of the URLs, convert them to <a> tags, and replace them in the DOM:

$doc = new DOMDocument();
$doc->loadHTML( $str);

$xpath = new DOMXpath($doc);
foreach( $xpath->query('//text()') as $text) {
if( !($text->parentNode->tagName == "a")) {
$frag = $doc->createDocumentFragment();
$frag->appendXML( preg_replace('#(http://stackoverflow.com/)#', '<a href="$1">$1</a>', $text->data));
$text->parentNode->replaceChild( $frag, $text);
}
}

Note that this relies on a regex to identify URLs, which is a difficult task. I suggest finding one that suits your needs, as it is currently using:

#(http://stackoverflow.com/)#

However, given this input:

http://stackoverflow.com/ is a wonderful URL.

<a href="http://stackoverflow.com/">Has already been linked.</a>

<a href="http://stackoverflow.com/">http://stackoverflow.com/</a>

It produces this output:

<p><a href="http://stackoverflow.com/">http://stackoverflow.com/</a> is a wonderful URL. 

<a href="http://stackoverflow.com/">Has already been linked.</a>

<a href="http://stackoverflow.com/">http://stackoverflow.com/</a></p>

Is There an Alternative Method to Mimic the Behavior of an Anchor Link/Target Pseudo-Class?

Looking at your sample, it seems you are using the CSS :target selector to handle displaying and hiding the lightbox. The :target selector is applied to the target element of the current URL, so the changes don't take affect if you don't modify the URL.

Instead of modifying the URL, change all the :target selectors in your CSS to be .target selectors.

Then, in your event handlers:

$('.pic > img').click(function() {
var srcToCopy = $(this).attr('src');
$('body').find('.imgsrc').attr('src', srcToCopy);
$('body').addClass('no-scroll');
$('#view').addClass("target");
});

$('#customlightbox-controls').on('click', function() {
$('body').removeClass('no-scroll');
$('#view').removeClass("target");
});

Now, when you click an image, the CSS style class target is added to the #view element, which causes it to appear, and when you click the Close box, the target class is removed, and they disappear.

You no longer need to change the URL or href, so you can remove the anchor tags for #view and the close onclick to set back to #!.

Sample new Lightbox instance:

<!-- Lightbox Instance 1 -->
<div class="container">
<div class="pic">
<img src="https://syedimranrocks.files.wordpress.com/2012/09/flower01low1.png">
</div>
</div>

Change to close lightbox control:

<div id="customlightbox-controls" class="lb-animate">
<a id="close-customlightbox" class="lb-animate"></a>
</div>

Update function to recognize links

I won't go down the rabbit hole about constructing a world-conquering regex pattern to extract all valid urls the world can dream up including unicode while denying urls with valid characters but illogical structures. (I'll go with Gumbo and move on.)

For a regex demo see: https://regex101.com/r/HFCP1Z/1/

Things to note:

  • If a url is matched, there is no capture group, so $m[1] isn't generated. If a user/hash tag is matched, the fullstring match and capture group 1 is generated. If an emoji is matched, the fullstring match is populated, the capture group 1 element is empty (but declared because php generates $m as an indexed array -- no gaps), and capture group 2 holds the emoji's parenthetical substring.

  • You need to be sure that you don't accidentally replace part of a url which contains a qualifying hashtag/usertag substring. (Currently, the other answers don't consider this vulnerability.) I am going to prevent that scenario by performing a single pass over the input and consuming whole url substrings before the other patterns get a chance at it.
    (notice: http://example.com/@dave and http://example.com?asdf=1234#anchor)

  • There are two reason that I am declaring your hashtag/usertag lookup array as a constant.

    1. It does not vary, so it needn't be a variable.
    2. It enjoys global scope, so the use() syntax is not necessary inside of preg_replace_callback().
  • You should avoid adding inline styling to your tags. I recommend assigning a class so that you can simply update a single portion of the stylesheet when you decide to amend/extend the styling at a later time.

Code: (Demo)

define('PINGTAGS', [
'#' => 'hashtag.php?hashtag',
'@' => 'user.php?user'
]);

function convert_text($str) {
return preg_replace_callback(
"~(?i)\bhttps?[-\w.\~:/?#[\]@!$&'()*+,;=]+|[@#](\w+)|U\+([A-F\d]{5})~",
function($m) {
// var_export($m); // see for yourself
if (!isset($m[1])) { // url
return sprintf('<a href="%s">%s</a>', $m[0], $m[0]);
}
if (!isset($m[2])) { // pingtag
return sprintf('<a href="%s=%s">%s</a>', PINGTAGS[$m[0][0]], $m[1], $m[0]);
}
return "<span class=\"emoji\">&#x{$m[2]};</span>"; // emoji
},
$str);
}

echo convert_text(
<<<STRING
This is a @ping and a #hash.
This is a www.example.com, this is http://example.com?asdf=1234#anchor
https://www.example.net/a/b/c/?g=5&awesome=foobar# U+23232 http://www5.example.com
https://sub.sub.www.example.org/ @pong@pug#tagged
http://example.com/@dave
more http://example.com/more_(than)_one_(parens)
andU+98765more http://example.com/blah_(wikipedia)#cite-1
and more http://example.com/blah_(wikipedia)_blah#cite-1
and more http://example.com/(something)?after=parens
STRING
);

Raw Output:

This is a <a href="user.php?user=ping">@ping</a> and a <a href="hashtag.php?hashtag=hash">#hash</a>.
This is a www.example.com, this is <a href="http://example.com?asdf=1234#anchor">http://example.com?asdf=1234#anchor</a>
<a href="https://www.example.net/a/b/c/?g=5&awesome=foobar#">https://www.example.net/a/b/c/?g=5&awesome=foobar#</a> <span class="emoji">𣈲</span> <a href="http://www5.example.com">http://www5.example.com</a>
<a href="https://sub.sub.www.example.org/">https://sub.sub.www.example.org/</a> <a href="user.php?user=pong">@pong</a><a href="user.php?user=pug">@pug</a><a href="hashtag.php?hashtag=tagged">#tagged</a>
<a href="http://example.com/@dave">http://example.com/@dave</a>
more <a href="http://example.com/more_(than)_one_(parens)">http://example.com/more_(than)_one_(parens)</a>
and<span class="emoji">򘝥</span>more <a href="http://example.com/blah_(wikipedia)#cite-1">http://example.com/blah_(wikipedia)#cite-1</a>
and more <a href="http://example.com/blah_(wikipedia)_blah#cite-1">http://example.com/blah_(wikipedia)_blah#cite-1</a>
and more <a href="http://example.com/(something)?after=parens">http://example.com/(something)?after=parens</a>

Stackoverflow-Rendered Output:

This is a @ping and a #hash.
This is a www.example.com, this is http://example.com?asdf=1234#anchor
https://www.example.net/a/b/c/?g=5&awesome=foobar# 𣈲 http://www5.example.com
https://sub.sub.www.example.org/ @pong@pug#tagged
http://example.com/@dave
more http://example.com/more_(than)one(parens)
and򘝥more http://example.com/blah_(wikipedia)#cite-1
and more http://example.com/blah_(wikipedia)_blah#cite-1
and more http://example.com/(something)?after=parens

p.s. The hash and user tags aren't highlighted here, but they are the local links that you asked for.

StackOverflow Style A Href Auto Linking in Regex

You should also look at the answers to this question: How to mimic StackOverflow Auto-Link Behavior


I have ended up combining the answers I have got both at stack overflow and talking to colleagues. The below code is the best we could come up with.

/**
* Search for and create links from urls
*/
static public function autoLink($text) {
$pattern = "/\b((?P<protocol>(https?)|(ftp)):\/\/)?(?P<domain>[-A-Z0-9\\.]+)[.][A-Z]{2,7}(([:])?([0-9]+)?)(?P<file>\/[-A-Z0-9+&@#\/%=~_|!:,\\.;]*)?(?P<parameters>\?[A-Z0-9+&@#\/%=~_|!:,\\.;]*)?/ise";
$text = preg_replace($pattern, "' <a href=\"'.htmlspecialchars('$0').'\">$0</a>'", $text);

// fix URLs without protocols
$text = preg_replace("#href='www#i", "href='http://www", $text);
$text = preg_replace("#href=['\"](?!(https?|ftp)://)#i", "href='http://", $text);

return $text;
}

php regex to match outside of html tags

You can use an assertion for that, as you just have to ensure that the searched words occur somewhen after an >, or before any <. The latter test is easier to accomplish as lookahead assertions can be variable length:

/(asf|foo|barr)(?=[^>]*(<|$))/

See also http://www.regular-expressions.info/lookaround.html for a nice explanation of that assertion syntax.



Related Topics



Leave a reply



Submit