Php: How to Resolve a Relative Url

file_get_contents( - Fix relative urls

Rather than trying to change every path reference in the source code, why don't you simply inject a <base> tag in your header to specifically indicate the base URL upon which all relative URL's should be calculated?

https://developer.mozilla.org/en-US/docs/Web/HTML/Element/base

This can be achieved using your DOM manipulation tool of choice. The example below would show how to do this using DOMDocument and related classes.

$target_domain = 'http://stackoverflow.com/';
$url = $target_domain . 'pagecalledjohn.php';
//Download page
$site = file_get_contents($url);
$dom = DOMDocument::loadHTML($site);

if($dom instanceof DOMDocument === false) {
// something went wrong in loading HTML to DOM Document
// provide error messaging and exit
}

// find <head> tag
$head_tag_list = $dom->getElementsByTagName('head');
// there should only be one <head> tag
if($head_tag_list->length !== 1) {
throw new Exception('Wow! The HTML is malformed without single head tag.');
}
$head_tag = $head_tag_list->item(0);

// find first child of head tag to later use in insertion
$head_has_children = $head_tag->hasChildNodes();
if($head_has_children) {
$head_tag_first_child = $head_tag->firstChild;
}

// create new <base> tag
$base_element = $dom->createElement('base');
$base_element->setAttribute('href', $target_domain);

// insert new base tag as first child to head tag
if($head_has_children) {
$base_node = $head_tag->insertBefore($base_element, $head_tag_first_child);
} else {
$base_node = $head_tag->appendChild($base_element);
}

echo $dom->saveHTML();

At the very minimum, it you truly want to modify all path references in the source code, I would HIGHLY recommend doing so with DOM manipulation tools (DOMDOcument, DOMXPath, etc.) rather than regex. I think you will find it a much more stable solution.

Transform relative path into absolute URL using PHP

function rel2abs($rel, $base)
{
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;

/* queries and anchors */
if ($rel[0]=='#' || $rel[0]=='?') return $base.$rel;

/* parse base URL and convert to local variables:
$scheme, $host, $path */
extract(parse_url($base));

/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);

/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';

/* dirty absolute URL */
$abs = "$host$path/$rel";

/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for($n=1; $n>0; $abs=preg_replace($re, '/', $abs, -1, $n)) {}

/* absolute URL is ready! */
return $scheme.'://'.$abs;
}

Resolve a relative path in a URL with PHP

This is a more simple problem then you are thinking about it. All you need to do is explode() on the / character, and parse out all of the individual segments using a stack. As you traverse the array from left to right, if you see ., do nothing. If you see .., pop an element from the stack. Otherwise, push an element onto the stack.

$str = 'domain.com/dir_1/dir_2/dir_3/./../../../';
$array = explode( '/', $str);
$domain = array_shift( $array);

$parents = array();
foreach( $array as $dir) {
switch( $dir) {
case '.':
// Don't need to do anything here
break;
case '..':
array_pop( $parents);
break;
default:
$parents[] = $dir;
break;
}
}

echo $domain . '/' . implode( '/', $parents);

This will properly resolve the URLs in all of your test cases.

Note that error checking is left as an exercise to the user (i.e. when the $parents stack is empty and you try to pop something off of it).

Replace all relative URLs with absolute URLS

New Answer

If your real html document is valid (and has a parent/containing tag), then the most appropriate and reliable technique will be to use a proper DOM parser.

Here is how DOMDocument and Xpath can be used to elegantly target and replace your designated tag attributes:

Code1 - Nested Xpath Queries: (Demo)

$domain = '//example.com';
$tagsAndAttributes = [
'img' => 'src',
'form' => 'action',
'a' => 'href'
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($tagsAndAttributes as $tag => $attr) {
foreach ($xpath->query("//{$tag}[not(starts-with(@{$attr}, '//'))]") as $node) {
$node->setAttribute($attr, $domain . $node->getAttribute($attr));
}
}
echo $dom->saveHTML();

Code2 - Single Xpath Query w/ Condition Block: (Demo)

$domain = '//example.com';
$targets = [
"//img[not(starts-with(@src, '//'))]",
"//form[not(starts-with(@action, '//'))]",
"//a[not(starts-with(@href, '//'))]"
];

$dom = new DOMDocument;
$dom->loadHTML($html, LIBXML_HTML_NOIMPLIED | LIBXML_HTML_NODEFDTD);
$xpath = new DOMXPath($dom);
foreach ($xpath->query(implode('|', $targets)) as $node) {
if ($src = $node->getAttribute('src')) {
$node->setAttribute('src', $domain . $src);
} elseif ($action = $node->getAttribute('action')) {
$node->setAttribute('action', $domain . $action);
} else {
$node->setAttribute('href', $domain . $node->getAttribute('href'));
}
}
echo $dom->saveHTML();

Old Answer: (...regex is not "DOM-aware" and is vulnerable to unexpected breakage)

If I understand you properly, you have a base value in mind, and you only want to apply it to relative paths.

Pattern Demo

Code: (Demo)

$html=<<<HTML
<img src="/relative/url/img.jpg" />
<form action="/">
<a href='/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />
HTML;

$base='https://example.com';

echo preg_replace('~(?:src|action|href)=[\'"]\K/(?!/)[^\'"]*~',"$base$0",$html);

Output:

<img src="https://example.com/relative/url/img.jpg" />
<form action="https://example.com/">
<a href='https://example.com/relative/url/'>Note the Single Quote</a>
<img src="//site.com/protocol-relative-img.jpg" />

Pattern Breakdown:

~                      #Pattern delimiter
(?:src|action|href) #Match: src or action or href
= #Match equal sign
[\'"] #Match single or double quote
\K #Restart fullstring match (discard previously matched characters
/ #Match slash
(?!/) #Negative lookahead (zero-length assertion): must not be a slash immediately after first matched slash
[^\'"]* #Match zero or more non-single/double quote characters
~ #Pattern delimiter

PHP Relative URL not working

if your server document root is /usr/share/nginx/html you can use $_SERVER['DOCUMENT_ROOT']

require $_SERVER['DOCUMENT_ROOT'].'/includes/header.php';

For the database file that is located outside of the server root - it is good practice to have the config file outside the document root, but it could be that you took it a bit too far as there is no difference if it's one level outside the root or more. in order to access the database file i think you have 2 options:

  1. add a constant with the real location of the Database file - this will be the easiest but will impact the migration of the site between different servers and you will need to remember to change it when you run the site on a different system.

  2. Locate the database file a bit closer to the document root - lets say one level up and then manipulate the string you get from $_SERVER['DOCUMEN_ROOT'] .

here is example to demonstrate how to get location when the database file is one level up from the root

$x = rtrim($_SERVER['DOCUMENT_ROOT'], '\/');
$x = substr($x, 0, (strrpos($x, '/')+1));
$x = $x.'/folder/database.php';

How can you put all relative URL to absolute URL using HTML DOM?

set $scheme parameter to true to return an absolute url if you want to link to an controller action.

Url::to(['my_controller/action'], true);

http://www.yiiframework.com/doc-2.0/yii-helpers-baseurl.html#to()-detail

For files you can use this Html::a( $text, $url = null, $options = [] ) with below options

// A normal string. This will use the exact string as the href attribute
$url = 'images/logo.png'; // This returns href='images/logo.png'

// A string starting with a Yii2 alias
$url = '@web/images/logo.png' // This returns href='http://www.example.com/images/logo.png'

PHP Resolving URL format with Base URL Relative Path into Absolute

Transform Relative Path Into Absolute URL Using PHP

function rel2abs($rel, $base) {
/* return if already absolute URL */
if (parse_url($rel, PHP_URL_SCHEME) != '') return $rel;

/* queries */
if ($rel[0] == '?') return explode("?", $base)[0] . $rel;

/* anchors */
if ($rel[0] == '#') return explode("#", $base)[0] . $rel;

/* parse base URL and convert to local variables: $scheme, $host, $path */
extract(parse_url($base));

/* Url begins with // */
if ($rel[0] == '/' && $rel[1] == '/') {
return "$scheme:$rel";
}

/* remove non-directory element from path */
$path = preg_replace('#/[^/]*$#', '', $path);

/* destroy path if relative url points to root */
if ($rel[0] == '/') $path = '';

/* dirty absolute URL */
$abs = "$host$path/$rel";

/* replace '//' or '/./' or '/foo/../' with '/' */
$re = array('#(/\.?/)#', '#/(?!\.\.)[^/]+/\.\./#');
for ($n = 1; $n > 0; $abs = preg_replace($re, '/', $abs, -1, $n)) {}

/* absolute URL is ready! */
return "$scheme://$abs";
}

Testing ...

echo '<h4>Queries</h4>';
echo rel2abs("?query=1", "http://something.net/path/test.php");
echo '<br>';
echo rel2abs("?query=1", "http://something.net/path/test.php?old_query=1");

echo '<h4>Anchors</h4>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1");
echo '<br>';
echo rel2abs("#newAnchores", "http://something.net/path/test.php?a=1#something");

echo '<h4>Path</h4>';
echo rel2abs("/testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./../testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("./testother.php", "http://something.net/folder1/folder2/folder3/test.php");
echo '<br>';
echo rel2abs("testother.php", "http://something.net/folder1/folder2/folder3/test.php");

echo '<h4>Url begins with //</h4>';
echo rel2abs("//google.com/path/", "https://something.net/path/test.php");
echo '<br>';
echo rel2abs("//google.com/path/", "http://something.net/path/test.php");

Test Output ...

Queries

http://something.net/path/test.php?query=1
http://something.net/path/test.php?query=1

Anchors

http://something.net/path/test.php?a=1#newAnchores
http://something.net/path/test.php?a=1#newAnchores

Path

http://something.net/testother.php
http://something.net/folder1/testother.php
http://something.net/folder1/folder2/testother.php
http://something.net/folder1/folder2/folder3/testother.php
http://something.net/folder1/folder2/folder3/testother.php

Url begins with //

https://google.com/path/
http://google.com/path/


Related Topics



Leave a reply



Submit