Ruby Code to Extract Host from Url String

Ruby code to extract host from URL string

You could try something like this:

require 'uri'

myUri = URI.parse( 'http://www.mglenn.com/directory' )
print myUri.host
# => www.mglenn.com

How do I get just the sitename from url in ruby?

Using a gem for this might be overkill, but anyway: There's a handy gem called domainatrix that can extract the sitename for your while dealing with things like two element top-level domains and more.

url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"

url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"

How would you parse a url in Ruby to get the main domain?

This should work with pretty much any URL:

# URL always gets parsed twice
def get_host_without_www(url)
url = "http://#{url}" if URI.parse(url).scheme.nil?
host = URI.parse(url).host.downcase
host.start_with?('www.') ? host[4..-1] : host
end

Or:

# Only parses twice if url doesn't start with a scheme
def get_host_without_www(url)
uri = URI.parse(url)
uri = URI.parse("http://#{url}") if uri.scheme.nil?
host = uri.host.downcase
host.start_with?('www.') ? host[4..-1] : host
end

You may have to require 'uri'.

How to get domain from URL without using URI Parser. I want to done it using regex

I'd go with @Arup Rakshit's solution. However if you really want a regexp, why not using

/^http:\/\/(.+)\.[a-z]{2,3}/

Given a URL, how can I get just the domain?

Use Addressable::URI.parse and the #host instance method:

Addressable::URI.parse("http://techcrunch.com/foo/bar").host #=> "techcrunch.com" 

How to parse a URL and extract the required substring

I'd do it this way:

require 'uri'

uri = URI.parse('http://something.example.com/directory/')
uri.host.split('.').first
=> "something"

URI is built into Ruby. It's not the most full-featured but it's plenty capable of doing this task for most URLs. If you have IRIs then look at Addressable::URI.

What's the best way to parse URLs to extract the domain?

You can use domainatrix gem to get what you want: url.domain + url.public_suffix, but you can just do some string manipulation like uri[4..-1].

Extract all urls inside a string in Ruby

A different approach, from the perfect-is-the-enemy-of-the-good school of thought:

urls = content.split(/\s+/).find_all { |u| u =~ /^https?:/ }


Related Topics



Leave a reply



Submit