How to Safely Join Relative Url Segments

Join two relative urls

The Addressable gem solves the problem:

require "addressable/uri"

fragment1 = '/api/v1/'

fragment2 = 'status'

Addressable::URI.join(fragment1, fragment2).to_s
# => "/api/v1/status"

How To Join Relative URLs in JavaScript

The following function decomposes the URL then resolves it.

function concatAndResolveUrl(url, concat) {
var url1 = url.split('/');
var url2 = concat.split('/');
var url3 = [ ];
for (var i = 0, l = url1.length; i < l; i ++) {
if (url1[i] == '..') {
url3.pop();
} else if (url1[i] == '.') {
continue;
} else {
url3.push(url1[i]);
}
}
for (var i = 0, l = url2.length; i < l; i ++) {
if (url2[i] == '..') {
url3.pop();
} else if (url2[i] == '.') {
continue;
} else {
url3.push(url2[i]);
}
}
return url3.join('/');
}

How to join components of a path when you are constructing a URL in Python

Since, from the comments the OP posted, it seems he doesn't want to preserve "absolute URLs" in the join (which is one of the key jobs of urlparse.urljoin;-), I'd recommend avoiding that. os.path.join would also be bad, for exactly the same reason.

So, I'd use something like '/'.join(s.strip('/') for s in pieces) (if the leading / must also be ignored -- if the leading piece must be special-cased, that's also feasible of course;-).

Combine URL paths with path.Join()

The function path.Join expects a path, not a URL. Parse the URL to get a path and join with that path:

u, err := url.Parse("http://foo")
if err != nil { log.Fatal(err) }
u.Path = path.Join(u.Path, "bar.html")
s := u.String()
fmt.Println(s) // prints http://foo/bar.html

Use the url.JoinPath function in Go 1.19 or later:

s, err := url.JoinPath("http://foo", "bar.html")
if err != nil { log.Fatal(err) }
fmt.Println(s) // prints http://foo/bar.html

Use ResolveReference if you are resolving a URI reference from a base URL. This operation is different from a simple path join: an absolute path in the reference replaces the entire base path; the base path is trimmed back to the last slash before the join operation.

base, err := url.Parse("http://foo/quux.html")
if err != nil {
log.Fatal(err)
}
ref, err := url.Parse("bar.html")
if err != nil {
log.Fatal(err)
}
u := base.ResolveReference(ref)
fmt.Println(u.String()) // prints http://foo/bar.html

Notice how quux.html in the base URL does not appear in the resolved URL.

Can I use require( path ).join to safely concatenate urls?

No. path.join() will return incorrect values when used with URLs.

It sounds like you want new URL(). From the WHATWG URL Standard:

new URL('/one', 'http://example.com/').href    // 'http://example.com/one'
new URL('/two', 'http://example.com/one').href // 'http://example.com/two'

Note that url.resolve is now marked as deprecated in the Node docs.

As Andreas correctly points out in a comment, url.resolve (also deprecated) would only help if the problem is as simple as the example. url.parse also applies to this question because it returns consistently and predictably formatted fields via the URL object that reduces the need for "code full of ifs". However, new URL() is also the replacement for url.parse.

Path.Combine for URLs?

There is a Todd Menier's comment above that Flurl includes a Url.Combine.

More details:

Url.Combine is basically a Path.Combine for URLs, ensuring one
and only one separator character between parts:

var url = Url.Combine(
"http://MyUrl.com/",
"/too/", "/many/", "/slashes/",
"too", "few?",
"x=1", "y=2"
// result: "http://www.MyUrl.com/too/many/slashes/too/few?x=1&y=2"

Get Flurl.Http on NuGet:

PM> Install-Package Flurl.Http

Or get the stand-alone URL builder without the HTTP features:

PM> Install-Package Flurl

Append relative URL to java.net.URL

URL has a constructor that takes a base URL and a String spec.

Alternatively, java.net.URI adheres more closely to the standards, and has a resolve method to do the same thing. Create a URI from your URL using URL.toURI.

How to join with relative paths only?

os.path.join() is not suitable for unsafe input, no. It is entirely deliberate that an absolute path ignores arguments before it; this allows for supporting both absolute and relative paths in a configuration file, say, without having to test the entered path. Just use os.path.join(standard_location, config_path) and it'll do the right thing for you.

Take a look at Flask's safe_join() to handle untrusted filenames:

import posixpath
import os.path

_os_alt_seps = list(sep for sep in [os.path.sep, os.path.altsep]
if sep not in (None, '/'))

def safe_join(directory, filename):
# docstring omitted for brevity
filename = posixpath.normpath(filename)
for sep in _os_alt_seps:
if sep in filename:
raise NotFound()
if os.path.isabs(filename) or \
filename == '..' or \
filename.startswith('../'):
raise NotFound()
return os.path.join(directory, filename)

This uses the posixpath (the POSIX implementation for the platform-agnostic os.path module) to normalise the URL path first; this removes any embedded ../ or ./ path segments, making it a fully normalised relative or absolute path.

Then any alternative separators other than / are excluded; you are not allowed to use /\index.html for example. Last but not least, absolute filenames, or relative filenames are specifically prohibited as well.



Related Topics



Leave a reply



Submit