How would you parse a url in Ruby to get the main domain?
This should work with pretty much any URL:
# URL always gets parsed twice
def get_host_without_www(url)
url = "http://#{url}" if URI.parse(url).scheme.nil?
host = URI.parse(url).host.downcase
host.start_with?('www.') ? host[4..-1] : host
end
Or:
# Only parses twice if url doesn't start with a scheme
def get_host_without_www(url)
uri = URI.parse(url)
uri = URI.parse("http://#{url}") if uri.scheme.nil?
host = uri.host.downcase
host.start_with?('www.') ? host[4..-1] : host
end
You may have to require 'uri'
.
How do I get just the sitename from url in ruby?
Using a gem for this might be overkill, but anyway: There's a handy gem called domainatrix that can extract the sitename for your while dealing with things like two element top-level domains and more.
url = Domainatrix.parse("http://www.pauldix.net")
url.url # => "http://www.pauldix.net" (the original url)
url.public_suffix # => "net"
url.domain # => "pauldix"
url.canonical # => "net.pauldix"
url = Domainatrix.parse("http://foo.bar.pauldix.co.uk/asdf.html?q=arg")
url.public_suffix # => "co.uk"
url.domain # => "pauldix"
url.subdomain # => "foo.bar"
url.path # => "/asdf.html?q=arg"
url.canonical # => "uk.co.pauldix.bar.foo/asdf.html?q=arg"
Given a URL, how can I get just the domain?
Use Addressable::URI.parse and the #host instance method:
Addressable::URI.parse("http://techcrunch.com/foo/bar").host #=> "techcrunch.com"
Parse URL in Ruby to get subdomain or main domain without www ?
Use Ruby's URI module:
require 'uri'
URI.parse('http://www.example.com/page').host
=> "www.example.com"
URI.parse('http://blog.example.com/page').host
=> "blog.example.com"
In both cases, URI extracts the whole host name, because selectively stripping the host from the domain makes no sense.
You'll have to implement that logic separately, using something like:
%w[http://www.example.com/page http://blog.example.com/page].each do |u|
puts URI.parse(u).host.sub(/^www\./, '')
end
Which outputs:
example.com
blog.example.com
How to get domain from URL without using URI Parser. I want to done it using regex
I'd go with @Arup Rakshit's solution. However if you really want a regexp, why not using
/^http:\/\/(.+)\.[a-z]{2,3}/
Get domain name for any type of URL format? -- PHP to Ruby
Using Addressable and taking advantage of ruby's String#slice
:
def domain_name(uri)
Addressable::URI.heuristic_parse(uri, :scheme => "http") \
.host[/\w+\.\w+(\.\w{2})?\Z/]
end
domain_name("stackoverflow.com") # => stackoverflow.com
domain_name("www.stackoverflow.com") # => stackoverflow.com
domain_name("http://stackoverflow.com") # => stackoverflow.com
domain_name("thing.com.au") # => thing.com.au
domain_name("some.thing.com.au") # => thing.com.au
domain_name("police.gov.uk") # => police.gov.uk
What regex can I use to get the domain name from a url in Ruby?
URI.parse('http://www.abc.google.com/').host
#=> "www.abc.google.com"
Not a regex, but probably more robust then anything we come up with here.
URI.parse('http://www.abc.google.com/').host.gsub(/^www\./, '')
If you want to remove the www.
as well this will work without raising any errors if the www.
is not there.
Get Base Domain Name from Host / URL in Rails
Two options that I'm aware of:
Create a regex that matches all possible domain extensions, then exract the preceding hostname. You can find a public list for all possible domain extensions here: https://publicsuffix.org/list/.
Use the domainatrix gem (https://github.com/pauldix/domainatrix) which has a registry of all domain extensions collected from the above list:
url = Domainatrix.parse("http://www.example.co.uk")
=> #
url.domain
=> "example"
url.public_suffix
=> "co.uk"
How would I parse this url to get the id?
With an regular expression like /\d+\z/
. You can build such one using rubular for example. The given expression extracts the id under the condition, that no other characters follow post the id and the id itself contains only digits.
How to parse url to get base url? -- Rails 3.1
Try to use 'uri' library:
require 'uri'
address = 'http://www.1800contacts.com/productlist.aspx?dl=P&source=cj&ac=8.2.0007'
uri = URI.parse(address)
puts "#{uri.scheme}://#{uri.host}" # => http://www.1800contacts.com
Related Topics
Rails Console: Reload! Not Reflecting Changes in Model Files? What Could Be Possible Reason
What Evaluates to False in Ruby
Rails 3.1 Limit User Created Objects
Why Should We Avoid Using Class Variables @@ in Rails
Getting "Warning! Path Is Not Properly Set Up" When Doing Rvm Use 2.0.0 --Default
Confirmation About Pgrep Returning Itself
Nokogiri Will Not Install - Error: Failed to Build Gem Native Extension
How to Find a Hash Key Containing a Matching Value
Rails Respond_To Format.Js API
Invalid Active Developer Path on MAC Os X After Installing Ruby
Why Does Ruby'S 'Gets' Includes the Closing Newline
How to Use Define_Method to Create Class Methods
Purpose of "Consider_All_Requests_Local" in Config/Environments/Development.Rb
Xpath Axis, Get All Following Nodes Until
How to Update Ruby to 1.9.X on Mac