Extract Part of Url With Regex

Extract part of URL with Regex

I'll list both regex and non regex way. Surprisingly the regex way seems shorter.

Regex Way

The regex to find bar and boo is this /.*\/(.*)\/(.*)$/ which is short, precise and exactly what you need.

Let's put into practice,

const params = "http://www.sub.domain.tld/foo/bar/boo".match(/.*\/(.*)\/(.*)$/)

This results in,

params;
["http://www.sub.domain.tld/foo/bar/boo","bar","boo"]

Just access it like params[0] and params[1].

Regex Explanation:

Sample Image

Extended Version:

The regex can be extended more to grab the /bar/foo/ pattern with a ending slash like this,

.*\/\b(.*)\/\b(.*)(\/?)$

Which means,
Sample Image

and it can be further extended, but let's keep it simple for now.

Non Regex Way

Use native methods like .split(),

function getLastParam(str, targetIndex = 1) {
const arr = str
.split("/") // split by slash
.filter(e=>e); // remove empty array elements
return arr[arr.length - targetIndex];
}

Let's test it out quickly for different cases

[
"http://domain.tld/foo/bar/boo",
"http://www.domain.tld/foo/bar/boo",
"http://sub.domain.tld/foo/bar/boo",
"http://www.sub.domain.tld/foo/bar/boo",
"http://domain.tld/foo/bar/boo/",
".../bar/boo"
].map(e => {
console.log({ input: e, output: getLastParam(e, 1) });
});

This will yield in following,

{input: "http://domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://www.sub.domain.tld/foo/bar/boo", output: "boo"}
{input: "http://domain.tld/foo/bar/boo/", output: "boo"}
{input: ".../bar/boo", output: "boo"}

If you want bar, then use 2 for targetIndex instead. It will get the second last. In which case, getLastParam(str, 2) would result in bar.

Speed stuff

Here is the small benchmark stuff, http://jsbench.github.io/#a6bcecaa60b7d668636f8f760db34483

getLastParamNormal: 5,203,853 ops/sec
getLastParamRegex: 6,619,590 ops/sec

Well, it doesn't matter. But nonetheless, it's interesting.

Regex to extract a part of an URL

You can use

^(?:https?://(?:www\.)?)*(.*)

See the regex demo. Details:

  • ^ - start of string
  • (?:https?://(?:www\.)?)* - zero or more occurrences of
    • https?:// - http:// or https://
    • (?:www\.)? - an optional sequence of www.
  • (.*) - Group 1: the rest of the string.

With REGEXEXTRACT, the output value is the text captured with Group 1.

Need to extract part of a url using Regex

I was able to eventually arrive at a solution that was closest to the format that I wanted as required in my question. I was able to do it by combining the solution of @sudhir-bastakoti and @wiktor-stribiżew as each individual answer did not address my question completely.

I am grateful to everyone that answered my question including @kooiinc. I checked out his last answer options and it worked. However, I wanted the answer in a certain format.

const s3bucket = 's3bucket';
const url = 's3://s3bucket/dynamodbtablename/05abd315-2e0b-4717-919d-1cc6576ebe19';
const migrationDataFileS3Key = url.match(new RegExp(String.raw`s3://${s3bucket}/(.*)`))[1];

How to extract a specific URL segment with Regex & C#

Although you should preferably go for URL related classes for parsing a URL as explained in another answer, as builtin functions are proven and well tested for handling even the corner cases, but as you mentioned you have some limitation and can only use a regex solution, you can try with following solution.

Finding sixth or Nth segment can be easily done using this regex,

(?:([^/]+)/){7}

which captures 6+1 (N+1 in general for Nth segment where +1 is for matching domain part of URL) segments and the group retains the last captured value which can be accessed using group1.

Here, ([^/]+) matches one or more any characters except a / and captures the content in group1 followed by / and whole of it matching exactly 7 times.

Regex Demo

C# code demo

var pattern = "(?:([^/]+)/){7}";
var match = Regex.Match("/domain.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);
match = Regex.Match("http://someother.com/segment1/segment2/segment3/segment4/segment5/segment6/segment7/filename.ext", pattern);
Console.WriteLine("Segment: " + match.Groups[1].Value);

Prints the value of sixth segment,

Segment: segment6
Segment: segment6

Extract first /part/ of url with Regex

There is way to extract the required part by using negative look-behind and a lazy quantifier: