Get all hrefs from string but then replace via another method
To call a function with the regex matches you can use the function preg_replace_callback http://php.net/manual/en/function.preg-replace-callback.php. something like:
function modify_href( $matches ) {
return $matches[1] . '/modified';
}
$result = preg_replace_callback('/(https?://([-\w\.]+)+(:\d+)?(/([\w/_\.]*(\?\S+)?)?)?)/', 'modify_href', $string);
I havent tested this, but I think it should work. I got the regex from here: https://rushi.wordpress.com/2008/04/14/simple-regex-for-matching-urls/
Find all hrefs in page and replace with link maintaining previous link - PHP
Use PHP's DomDocument
to parse the page
$doc = new DOMDocument();
// load the string into the DOM (this is your page's HTML), see below for more info
$doc->loadHTML('<a href="http://www.google.com">Google</a>');
//Loop through each <a> tag in the dom and change the href property
foreach($doc->getElementsByTagName('a') as $anchor) {
$link = $anchor->getAttribute('href');
$link = 'http://www.example.com/?loadpage='.urlencode($link);
$anchor->setAttribute('href', $link);
}
echo $doc->saveHTML();
Check it out here: http://codepad.org/9enqx3Rv
If you don't have the HTML as a string, you may use cUrl (docs) to grab the HTML, or you can use the loadHTMLFile
method of DomDocument
Documentation
DomDocument
- http://php.net/manual/en/class.domdocument.phpDomElement
- http://www.php.net/manual/en/class.domelement.phpDomElement::getAttribute
- http://www.php.net/manual/en/domelement.getattribute.phpDOMElement::setAttribute
- http://www.php.net/manual/en/domelement.setattribute.phpurlencode
- http://php.net/manual/en/function.urlencode.phpDomDocument::loadHTMLFile
- http://www.php.net/manual/en/domdocument.loadhtmlfile.php- cURL - http://php.net/manual/en/book.curl.php
How can I replace all internal urls in a string of html with their relative external url?
Use
.replace(/\b((?:href|src)=)(?!\/\/example\.com)(["']?)([^"']+)\2/gi,
(_,x,y,z) => z.charAt(0) == '/' ?
`${x}${y}//example.com${z}${y}` : `${x}${y}//example.com/${z}${y}`)
See regex proof.
Explanation
--------------------------------------------------------------------------------
\b the boundary between a word char (\w) and
something that is not a word char
--------------------------------------------------------------------------------
( group and capture to \1:
--------------------------------------------------------------------------------
(?: group, but do not capture:
--------------------------------------------------------------------------------
href 'href'
--------------------------------------------------------------------------------
| OR
--------------------------------------------------------------------------------
src 'src'
--------------------------------------------------------------------------------
) end of grouping
--------------------------------------------------------------------------------
= '='
--------------------------------------------------------------------------------
) end of \1
--------------------------------------------------------------------------------
(?! look ahead to see if there is not:
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
\/ '/'
--------------------------------------------------------------------------------
example 'example'
--------------------------------------------------------------------------------
\. '.'
--------------------------------------------------------------------------------
com 'com'
--------------------------------------------------------------------------------
) end of look-ahead
--------------------------------------------------------------------------------
( group and capture to \2:
--------------------------------------------------------------------------------
["']? any character of: '"', ''' (optional
(matching the most amount possible))
--------------------------------------------------------------------------------
) end of \2
--------------------------------------------------------------------------------
( group and capture to \3:
--------------------------------------------------------------------------------
[^"']+ any character except: '"', ''' (1 or
more times (matching the most amount
possible))
--------------------------------------------------------------------------------
) end of \3
--------------------------------------------------------------------------------
\2 what was matched by capture \2
const string = ' href="nowhere" src="/nothing.js"';
const rx = /\b((?:href|src)=)(?!\/\/example\.com)(["']?)([^"']+)\2/gi;
console.log(string.replace(rx, (_,x,y,z) => z.charAt(0) == '/' ?
`${x}${y}//example.com${z}${y}` : `${x}${y}//example.com/${z}${y}`));
jQuery: Change all href values dynamically
You could actually bind this event to a click. So when clickinng the link it will run this function and change the link. This would then change any new link also as long as this is bound to a parent or the document for example below
$(document).on('click', 'a', function(e){
e.preventDefault();
var link = $(this).attr( 'href' );
link = link.replace("/link", "https://sub2.domain1.com/link");
window.location.href = link;
});
jQuery: replace If this HREF contains
There's a couple of issues in your code. Firstly $=
in the attribute selector is for 'ends with' matches. To match at the start of the attribute value use ^=
.
Secondly you need to replace the existing value, not overwrite the entire thing with the new URL only.
Lastly, you can simplify the logic by providing a function to attr()
which is executed against all selected a
elements instead of an explicit each()
loop.
With all that said, try this:
jQuery($ => {
$('.post-body-inner a').attr('href', (i, h) => h.replace('https://redirect.affiliatelink1.com', 'https://api.affiliatelink2.com'));
});
<script src="https://cdnjs.cloudflare.com/ajax/libs/jquery/3.3.1/jquery.min.js"></script>
<div class="post-body-inner">
<a href="https://redirect.affiliatelink1.com/foo">Foo</a>
</div>
How to replace specific text with hyperlinks without modifying pre-existing img and a tags?
I think @Jiwoks' answer was on the right path with using dom parsing calls to isolate the qualifying text nodes.
While his answer works on the OP's sample data, I was unsatisfied to find that his solution failed when there was more than one string to be replaced in a single text node.
I've crafted my own solution with the goal of accommodating case-insensitive matching, word-boundary, multiple replacements in a text node, and fully qualified nodes being inserted (not merely new strings that look like child nodes).
Code: (Demo #1 with 2 replacements in a text node) (Demo #2: with OP's text)
(After receiving fuller, more realistic text from the OP: Demo #3 without trimming saveHTML())
$html = <<<HTML
Meet God's General Kathryn Kuhlman. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517" />
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
Max KANTCHEDE & Kathryn Kuhlman
HTML;
$keywords = [
'Kathryn Kuhlman' => 'https://www.example.com/en-354',
'Max KANTCHEDE' => 'https://www.example.com/MaxKANTCHEDE',
'eneral' => 'https://www.example.com/this-is-not-used',
];
libxml_use_internal_errors(true);
$dom = new DOMDocument();
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
$lookup = [];
$regexNeedles = [];
foreach ($keywords as $name => $link) {
$lookup[strtolower($name)] = $link;
$regexNeedles[] = preg_quote($name, '~');
}
$pattern = '~\b(' . implode('|', $regexNeedles) . ')\b~i' ;
foreach($xpath->query('//*[not(self::img or self::a)]/text()') as $textNode) {
$newNodes = [];
$hasReplacement = false;
foreach (preg_split($pattern, $textNode->nodeValue, 0, PREG_SPLIT_NO_EMPTY | PREG_SPLIT_DELIM_CAPTURE) as $fragment) {
$fragmentLower = strtolower($fragment);
if (isset($lookup[$fragmentLower])) {
$hasReplacement = true;
$a = $dom->createElement('a');
$a->setAttribute('href', $lookup[$fragmentLower]);
$a->setAttribute('title', $fragment);
$a->nodeValue = $fragment;
$newNodes[] = $a;
} else {
$newNodes[] = $dom->createTextNode($fragment);
}
}
if ($hasReplacement) {
$newFragment = $dom->createDocumentFragment();
foreach ($newNodes as $newNode) {
$newFragment->appendChild($newNode);
}
$textNode->parentNode->replaceChild($newFragment, $textNode);
}
}
echo substr(trim($dom->saveHTML()), 3, -4);
Output:
Meet God's General <a href="https://www.example.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>. <br>
<img class="lazy_responsive" title="Kathryn Kuhlman - iUseFaith.com" src="https://www.iusefaith.com/ojm_thumbnail/1000/32f808f79011a7c0bd1ffefc1365c856.jpg" alt="Kathryn Kuhlman - iUseFaith.com" width="1600" height="517">
<br>
Follow <a href="https://www.iusefaith.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
<br>
<a href="https://www.example.com/MaxKANTCHEDE" title="Max KANTCHEDE">Max KANTCHEDE</a> & <a href="https://www.example.com/en-354" title="Kathryn Kuhlman">Kathryn Kuhlman</a>
Some explanatory points:
- I am using some DomDocument silencing and flags because the sample input is missing a parent tag to contain all of the text. (There is nothing wrong with @Jiwoks' technique, this is just a different one -- choose whatever you like.)
- A lookup array with lowercased keys is declared to allow case-insensitive translations on qualifying text.
- A regex pattern is dynamically constructed and therefore should be
preg_quote()
ed to ensure that the pattern logic is upheld.b
is a word boundary metacharacter to prevent matching a substring in a longer word. Notice thateneral
is not replaced inGeneral
in the output. The case-insensitive flagi
will allow greater flexibility for this application and future applications. - My xpath query is identical to @Jiwoks'; if see no reason to change it. It is seeking text nodes that are not the children of
<img>
or<a>
tags.
...now it gets a little fiddly... Now that we are dealing with isolated text nodes, regex can be used to differentiate qualifying strings from non-qualifying strings.
preg_split()
is creating a flat, indexed array of non-empty substrings. Substrings which qualify for translation will be isolated as elements and if there are any non-qualifying substrings, they will be isolated elements.The final text node in my sample will generate 4 elements:
0 => '
', // non-qualifying newline
1 => 'Max KANTCHEDE', // translatable string
2 => ' & ', // non-qualifying text
3 => 'Kathryn Kuhlman' // translatable string
For translatable strings, new
<a>
nodes are created and filled with the appropriate attributes and text, then pushed into a temporary array.For non-translatable strings, text nodes are created, then pushed into a temporary array.
If any translations/replacements have been done, then dom is updated; otherwise, no mutation of the document is necessary.
In the end, the finalized html document is echoed, but because your sample input has some text that is not inside of tags, the temporary leading
<p>
and trailing</p>
tag that DomDocument applied for stability must be removed to restore the structure to its original form. If all text is enclosed in tags, you can just usesaveHTML()
without any hacking at the string.
change all href of external page after insert it
You could do a simple string replace like:
str_replace('href="', 'href="http://www.example.com/', $string);
or with domdocument:
$page = '<html><head></head><body><a href="simple"></a><h1>Hi</h1><a href="simple2"></a></body></html>';
$doc = new DOMDocument();
$doc->loadHTML($page);
$as = $doc->getElementsByTagName('a');
foreach($as as $a){
$a->setAttribute('href', 'http://www.example.com/' . $a->getAttribute('href'));
}
print_r($doc->saveHTML());
output:
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
<html><head></head><body><a href="http://www.example.com/simple"></a><h1>Hi</h1><a href="http://www.example.com/simple2"></a></body></html>
This doesn't take into account absolute paths, you'll need a regex approach for that..
If the quote types vary you also will need to use a regex for the str_replace
example. Can do something like('|")
for that then use $1
to match the quote type.
Related Topics
Alert Show Up When I Refresh Page
Update Data on a Page Without Refreshing
Cron Job to Delete Files Created Before a Specific Time
General Error: 1364 Field 'User_Id' Doesn't Have a Default Value
Regex for No More Than 5 Digits or Contain String
How to Count Columns With the Same Value in a Specific Row in MySQL
How to Echo Selected Value of Dropdown in PHP
Php Warning: Mysqli_Connect(): (Hy000/2002): Connection Refused
How to Add a Space Between Every Sequence of Four Characters (Like a Credit Card Number)
Passing an Array to a Query Using a Where Clause
How to Show Checkboxes as Checked When Values Are Set in the Database in Laravel
How to Run a PHP Script in the Background After a Form Is Submitted
Passing PHP Variable in Onclick Function
How to Generate Unique Random Value for Each User in Laravel and Add It to Database
Extension Mysqli Is Missing, Phpmyadmin Doesn't Work
How to Insert Special Character in MySQL Via PHP and Display on HTML Page