Transform Relative Path into Absolute Url Using PHP

Transform relative path into absolute URL using PHP

function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;

/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;

/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));

/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);

/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';

/* dirty absolute URL */
$abs = "$host$path/$rel";

/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}

/* absolute URL is ready! */
return $scheme.'://'.$abs;
}

php convert absolute URL containing relative paths in absolute url without relative path

Perhaps a starting point:

<?php
function unrelatify($url)
{
$parts = parse_url($url);
$path = $parts['path'] ?? '';
$hierarchy = explode('/', $path);

while(($key = array_search('..', $hierarchy)) !== false) {
if($key-1 > 0)
unset($hierarchy[$key-1]);
unset($hierarchy[$key]);
$hierarchy = array_values($hierarchy);
}
$new_path = implode('/', $hierarchy);

return str_replace($path, $new_path, $url);
}

echo unrelatify('http://example.com/../folder/../folder2/../image/test.jpg#foo?bar=baz');

Output:

http://example.com/image/test.jpg#foo?bar=baz

You may want to see how browsers and other web clients de-relativify (urls).

Converting remote relative paths to absolute paths

I ended up writing my own function, after a push in the right direction from @bozdoz.

The function takes two arguments, first one is $resource, which is the relative file path.
And the second one is is the base url (which will be used to construct an absolute url).

This was design for my project purposes, I'm not sure it will fit anyone who is looking
for a similar solution. Feel free to use it, and provide any efficiency improvements.

Updated version Thanks to Tim Cooper

function rel2abs_v2($resource, $base_url) 
{
$base_url = parse_url($base_url);

if(substr($resource, 0, 4) !== "http" && substr($resource, 0, 5) !== "https") // if no http/https is present, then {$resource} is a relative path.
{
# There is a "../" in the string
if (strpos($resource, "../") !== false)
{
$dir_count = substr_count($resource, "../");

$path_array = explode("/", $base_url["path"]);
$path_count = count($path_array); // 4
$path_index = ($path_count - $dir_count) - 2;

$resource = trim(str_replace("../", "", $resource));

if($path_index > 0) { $fs = "/"; }

if($dir_count > 0)
{
$base_url_path = implode("/", array_slice($path_array, $dir_count, $path_index - $dir_count + 1));
return $base_url['scheme'] . '://' . $base_url['host'] . $fs . $base_url_path ."/". $resource;
}
}

# Latest addition - remove if unexplained behaviour is in place.
if(starts_with($resource, "//"))
{
return trim(str_replace("//", "", $resource));
}

if (starts_with($resource, "/"))
{
return $base_url["scheme"] . "://" . $base_url["host"] . $resource;
}
else
{
$path_array = explode("/", $base_url["path"]);

end($path_array);
$last_id = key($path_array);

return $base_url["scheme"] . "://" . $base_url["host"] . "/" . $path_array[--$last_id] . "/" . $resource;
}

}
else
{
return $resource;
}
}

Convert given relative urls to absolute urls

If all the urls start with a forward slash, you might use:

(?<!\S)(?:/[^/\s]+)+/\S+\.html\S*

Explanation

  • (?<!\S) Assert what is directly on the left is not a non whitespace char
  • (?:/[^/\s]+)+ Repeat 1+ times matching /, then not / or a whitespace char using a negated character class
  • /\S+ Match / and 1+ times a non whitespace char
  • \.html\S* Match .html as in the example data and 0+ times a non whitespace chars

Regex demo

If you also want to match /1.html you could use change the quantifier into )* instead of )+

To match more extensions than .html you might specify what you would allow to match like \.(?:html|jpg|png) or perhaps use character class \.[\w-()] and add what you would allow to match.

Replace all relative URLs with absolute URLS

New Answer

If your real html document is valid (and has a parent/containing tag), then the most appropriate and reliable technique will be to use a proper DOM parser.

Here is how DOMDocument and Xpath can be used to elegantly target and replace your designated tag attributes:

Code1 - Nested Xpath Queries: (Demo)

$domain = '//example.com';
$tagsAndAttributes = [
'img' => 'src',
'form' => 'action',
'a' => 'href'
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($tagsAndAttributes as $tag => $attr) {
foreach ($xpath->query("//{$tag}[not(starts-with(@{$attr}, '//'))]") as $node) {
$node->setAttribute($attr, $domain . $node->getAttribute($attr));
}
}
echo $dom->saveHTML();

Code2 - Single Xpath Query w/ Condition Block: (Demo)

$domain = '//example.com';
$targets = [
"//img[not(starts-with(@src, '//'))]",
"//form[not(starts-with(@action, '//'))]",
"//a[not(starts-with(@href, '//'))]"
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query(implode('|', $targets)) as $node) {
if ($src = $node->getAttribute('src')) {
$node->setAttribute('src', $domain . $src);
} elseif ($action = $node->getAttribute('action')) {
$node->setAttribute('action', $domain . $action);
} else {
$node->setAttribute('href', $domain . $node->getAttribute('href'));
}
}
echo $dom->saveHTML();

Old Answer: (...regex is not "DOM-aware" and is vulnerable to unexpected breakage)

If I understand you properly, you have a base value in mind, and you only want to apply it to relative paths.

Pattern Demo

Code: (Demo)

$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;

$base='https://example.com';

echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);

Output:

<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />

Pattern Breakdown:

~                      #Pattern delimiter
(?:src|action|href) #Match: src or action or href
= #Match equal sign
[\'"] #Match single or double quote
\K #Restart fullstring match (discard previously matched characters
/ #Match slash
(?!/) #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]* #Match zero or more non-single/double quote characters
~ #Pattern delimiter

PHP Resolving URL format with Base URL Relative Path into Absolute

Transform Relative Path Into Absolute URL Using PHP

function rel2abs($rel, $base) {
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;

/* queries */
if ($rel[0] == '?') return explode("?", $base)[0] . $rel;

/* anchors */
if ($rel[0] == '#') return explode("#", $base)[0] . $rel;

/* parse base URL and convert to local variables: $scheme, $host, $path */
extract(parse_url($base));

/* Url begins with // */
if ($rel[0] == '/' && $rel[1] == '/') {
return "$scheme:$rel";
}

/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);

/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';

/* dirty absolute URL */
$abs = "$host$path/$rel";

/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {}

/* absolute URL is ready! */
return "$scheme://$abs";
}

Testing ...

echo '<h4>Queries</h4>';
echo rel2abs("?query=1", "http://something.net/path/test.php");
echo '<br>';
echo rel2abs("?query=1", "http://something.net/path/test.php?old_query=1");

echo '<h4>Anchors</h4>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1");
echo '<br>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1#something");

echo '<h4>Path</h4>';
echo rel2abs("/testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("testother.php", "http://something.net/folder1/folder2/folder3/test.php");

echo '<h4>Url begins with //</h4>';
echo rel2abs("//google.com/path/", "https://something.net/path/test.php");
echo '<br>';
echo rel2abs("//google.com/path/", "http://something.net/path/test.php");

Test Output ...

Queries

http://something.net/path/test.php?query=1
http://something.net/path/test.php?query=1

Anchors

http://something.net/path/test.php?a=1#newAnchores
http://something.net/path/test.php?a=1#newAnchores

Path

http://something.net/testother.php
http://something.net/folder1/testother.php
http://something.net/folder1/folder2/testother.php
http://something.net/folder1/folder2/folder3/testother.php
http://something.net/folder1/folder2/folder3/testother.php

Url begins with //

https://google.com/path/
http://google.com/path/

Change relative URLs to absolute URLs after Curl

I'm not exactley sure why it replaces it just one time (maybe it has something to do with the backreference), but when you wrap it in a while loop, it should work.

$pattern = '~(href|src)=(["\'])(?!#|//|http)([^\2]*)\2~i';
while (preg_match($pattern, $result)) {
$result = preg_replace($pattern,'$1="http://www.example.com$3"', $result);
}

(I also changed the pattern slightly.)

To convert an absolute path to a relative path in php

The problem is that your question, although it seems very specific, is missing some crucial details.

If the script you posted is always being executed, and you always want it to go to delapo.com instead of temiremi.com, then all you would have to do is replace

$site_url    = "http://".$_SERVER["HTTP_HOST"]."$sitefolder";

with

$site_url    = "http://www.delapo.com/$sitefolder";

The $_SERVER["HTTP_HOST"] variable will return the domain for whatever site was requested. Therefore, if the user goes to www.temiremi.com/myscript.php (assuming that the script you posted is saved in a file called myscript.php) then $_SERVER["HTTP_HOST"] just returns www.temiremi.com.

On the other hand, you may not always be redirecting to the same domain or you may want the script to be able to adapt easily to go to different domains without having to dig through layers of code in the future. If this is the case, then you will need a way to figuring out what domain you wish to link to.

If you have a website hosted on temiremi.com but you want it to look like you are accessing from delapo.com, this is not an issue that can be resolved by PHP. You would have to have delapo.com redirect to temiremi.com or simply host on delapo.com in the first place.

If the situation is the other way around and you want a website hosted on delapo.com but you want users to access temiremi.com, then simply re-writing links isn't a sophisticated enough answer. This strategy would redirect the user to the other domain when they clicked the link. Instead you would need to have a proxy set up to forward the information. Proxy scripts vary in complexity, but the simplest one would be something like:

<?php
$site = file_get_contents("http://www.delapo.com/$sitefolder");
echo $site;
?>

So you see, we really need a little more information on why you need this script and its intended purpose in order to assist you.



Related Topics



Leave a reply



Submit