What Regex How to Use to Get the Domain Name from a Url in Ruby

What regex can I use to get the domain name from a url in Ruby?

URI.parse('http://www.abc.google.com/').host
#=> "www.abc.google.com"

Not a regex, but probably more robust then anything we come up with here.

URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')

If you want to remove the www. as well this will work without raising any errors if the www. is not there.

Getting domain of an URL with Regular Expressions

Why don't you just use the URI class to do this?

URI.parse( your_uri ).host

And you're done.

Just one thing, if there's no "http://" or "https://" at the beginning of the url, you'll have to add one, or the parse method is not going to give you a host (it's going to be nil).

How to use regex for extract domain parts from string

In the Ruby I have used something like this

user:~/workspace $ irb
2.3.4 :018 > url = "https://example.com"
=> "https://.example.com"
2.3.4 :019 > u = url.match(/(?<protocol>[\w]+):\/\/(?<domain>[\w-]+)\.(?<extension>\w+)/)
=> #<MatchData "https://example.com" protocol:"https" domain:"example" extension:"com">
2.3.4 :020 > u[:protocol]
=> "https"
2.3.4 :021 > u[:domain]
=> "example"
2.3.4 :022 > u[:extension]
=> "com"

If you have also subdomain then use like below regular expression

2.3.4 :034 > url = "https://sub.example.com"    
2.3.4 :035 > u = url.match(/(?<protocol>[\w]+):\/\/(?<domain>[[a-zA-Z0-9]\.-]+)\.(?<extension>\w+)/)
=> #<MatchData "https://sub.example.com" protocol:"https" domain:"sub.example" extension:"com">
2.3.4 :036 > u[:protocol]
=> "https"
2.3.4 :037 > u[:domain]
=> "sub.example"
2.3.4 :038 > u[:extension]
=> "com"

In the http://rubular.com/ I have created a snippet for testing regular expression which not failing with subdomain see this Rubular

How to get domain from URL without using URI Parser. I want to done it using regex

I'd go with @Arup Rakshit's solution. However if you really want a regexp, why not using

/^http:\/\/(.+)\.[a-z]{2,3}/

Validation for URL/Domain using Regex? (Rails)

Stumbled on this:

validates_format_of :domain_name, :with => /^(http|https):\/\/[a-z0-9]+([\-\.]{1}[a-z0-9]+)*\.[a-z]{2,5}(:[0-9]{1,5})?(\/.*)?$/ix

FYI: Rubular is a fantastic resource for testing your Ruby regular expressions

Given a URL, how can I get just the domain?

Use Addressable::URI.parse and the #host instance method:

Addressable::URI.parse("http://techcrunch.com/foo/bar").host #=> "techcrunch.com" 

How do I get just the sitename from url in ruby?

Using a gem for this might be overkill, but anyway: There's a handy gem called domainatrix that can extract the sitename for your while dealing with things like two element top-level domains and more.

url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

How can I extract just the domain from a url?

> require 'uri'
=> true
> uri = URI.parse "http://twitter.com/"
=> #<URI::HTTP:0x00000100994f98 URL:http://twitter.com/>
> uri.host
=> "twitter.com"


Related Topics



Leave a reply



Submit