How Would You Parse a Url in Ruby to Get the Main Domain

How would you parse a url in Ruby to get the main domain?

This should work with pretty much any URL:

# URL always gets parsed twice
def get_host_without_www(url)
url = "http://#{url}" if URI.parse(url).scheme.nil?
host = URI.parse(url).host.downcase
host.start_with?('www.') ? host[4..-1] : host
end

Or:

# Only parses twice if url doesn't start with a scheme
def get_host_without_www(url)
uri = URI.parse(url)
uri = URI.parse("http://#{url}") if uri.scheme.nil?
host = uri.host.downcase
host.start_with?('www.') ? host[4..-1] : host
end

You may have to require 'uri'.

How do I get just the sitename from url in ruby?

Using a gem for this might be overkill, but anyway: There's a handy gem called domainatrix that can extract the sitename for your while dealing with things like two element top-level domains and more.

url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

Given a URL, how can I get just the domain?

Use Addressable::URI.parse and the #host instance method:

Addressable::URI.parse("http://techcrunch.com/foo/bar").host #=> "techcrunch.com" 

Parse URL in Ruby to get subdomain or main domain without www ?

Use Ruby's URI module:

require 'uri'
URI.parse('http://www.example.com/page').host
=> "www.example.com"
URI.parse('http://blog.example.com/page').host
=> "blog.example.com"

In both cases, URI extracts the whole host name, because selectively stripping the host from the domain makes no sense.

You'll have to implement that logic separately, using something like:

%w[http://www.example.com/page http://blog.example.com/page].each do |u|
puts URI.parse(u).host.sub(/^www\./, '')
end

Which outputs:

example.com
blog.example.com

How to get domain from URL without using URI Parser. I want to done it using regex

I'd go with @Arup Rakshit's solution. However if you really want a regexp, why not using

/^http:\/\/(.+)\.[a-z]{2,3}/

Get domain name for any type of URL format? -- PHP to Ruby

Using Addressable and taking advantage of ruby's String#slice:

def domain_name(uri)
Addressable::URI.heuristic_parse(uri, :scheme => "http") \
.host[/\w+\.\w+(\.\w{2})?\Z/]
end

domain_name("stackoverflow.com") # => stackoverflow.com
domain_name("www.stackoverflow.com") # => stackoverflow.com
domain_name("http://stackoverflow.com") # => stackoverflow.com
domain_name("thing.com.au") # => thing.com.au
domain_name("some.thing.com.au") # => thing.com.au
domain_name("police.gov.uk") # => police.gov.uk

What regex can I use to get the domain name from a url in Ruby?

URI.parse('http://www.abc.google.com/').host
#=> "www.abc.google.com"

Not a regex, but probably more robust then anything we come up with here.

URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')

If you want to remove the www. as well this will work without raising any errors if the www. is not there.

Get Base Domain Name from Host / URL in Rails

Two options that I'm aware of:

  1. Create a regex that matches all possible domain extensions, then exract the preceding hostname. You can find a public list for all possible domain extensions here: https://publicsuffix.org/list/.

  2. Use the domainatrix gem (https://github.com/pauldix/domainatrix) which has a registry of all domain extensions collected from the above list:

    url = Domainatrix.parse("http://www.example.co.uk")
    => #
    url.domain
    => "example"
    url.public_suffix
    => "co.uk"

How would I parse this url to get the id?

With an regular expression like /\d+\z/. You can build such one using rubular for example. The given expression extracts the id under the condition, that no other characters follow post the id and the id itself contains only digits.

How to parse url to get base url? -- Rails 3.1

Try to use 'uri' library:

require 'uri'
address = 'http://www.1800contacts.com/productlist.aspx?dl=P&source=cj&ac=8.2.0007'
uri = URI.parse(address)
puts "#{uri.scheme}://#{uri.host}" # => http://www.1800contacts.com


Related Topics



Leave a reply



Submit