Rails: What's a Good Way to Validate Links (Urls)

Rails: What's a good way to validate links (URLs)?

Validating an URL is a tricky job. It's also a very broad request.

What do you want to do, exactly? Do you want to validate the format of the URL, the existence, or what? There are several possibilities, depending on what you want to do.

A regular expression can validate the format of the URL. But even a complex regular expression cannot ensure you are dealing with a valid URL.

For instance, if you take a simple regular expression, it will probably reject the following host

http://invalid##host.com

but it will allow

http://invalid-host.foo

that is a valid host, but not a valid domain if you consider the existing TLDs. Indeed, the solution would work if you want to validate the hostname, not the domain because the following one is a valid hostname

http://host.foo

as well the following one

http://localhost

Now, let me give you some solutions.

If you want to validate a domain, then you need to forget about regular expressions. The best solution available at the moment is the Public Suffix List, a list maintained by Mozilla. I created a Ruby library to parse and validate domains against the Public Suffix List, and it's called PublicSuffix.

If you want to validate the format of an URI/URL, then you might want to use regular expressions. Instead of searching for one, use the built-in Ruby URI.parse method.

require 'uri'

def valid_url?(uri)
uri = URI.parse(uri) && uri.host
rescue URI::InvalidURIError
false
end

You can even decide to make it more restrictive. For instance, if you want the URL to be an HTTP/HTTPS URL, then you can make the validation more accurate.

require 'uri'

def valid_url?(url)
uri = URI.parse(url)
uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
false
end

Of course, there are tons of improvements you can apply to this method, including checking for a path or a scheme.

Last but not least, you can also package this code into a validator:

class HttpUrlValidator < ActiveModel::EachValidator

def self.compliant?(value)
uri = URI.parse(value)
uri.is_a?(URI::HTTP) && !uri.host.nil?
rescue URI::InvalidURIError
false
end

def validate_each(record, attribute, value)
unless value.present? && self.class.compliant?(value)
record.errors.add(attribute, "is not a valid HTTP URL")
end
end

end

# in the model
validates :example_attribute, http_url: true

How to validate url in rails model?

I solved this problem with this gem https://github.com/ralovets/valid_url

Url validation rails

URI.regexp returns a URI which will match all valid URIs, but a) it doesn't validate that the string is only that URI, and b) you want to validate URLs, not URIs (URI is the broader term).

For a), you can modify the regex to match only if the string is just the URI by wrapping it in the ^ (start-of-line) and $ (end-of-line) symbols:

validates :website, format: { with: /^#{URI.regexp.to_s}$/ }, if: 'website.present?'

For b), you could change your call to URI.regexp(['http', 'https']) to limit the allowed schemes, which gets you closer. There are also gems for this problem, like valid_url. Or, just accept that your validation will never be perfect.

How to validate a url in Rails

I found a good way to do validate an url in rails 4 with a method of the URI class.

class User < ActiveRecord::Base
validates_format_of :url, :with => URI::regexp(%w(http https))
end

ActiveRecord validate url if it is present

Separate your two validators.

validates :url, presence: true
validates :url, format: { with: URI.regexp }, if: Proc.new { |a| a.url.present? }

(almost) 2 year anniversary edit

As vrybas and Barry state, the Proc is unnecessary. You can write your validators like this:

validates :url, presence: true
validates :url, format: { with: URI.regexp }, if: 'url.present?'

How to validate unique url with Rails and validates_url?

So the two validations, uniqueness and URL, happen separately, and there is nothing in the uniqueness check to handle the fact that those two URLs are essentially the same - instead, the string values are technically different, and thus it doesn't trip the uniqueness validation.

What you could do is look to tidy up your URL data before validation, with a before_validation callback in your model:

before_validation :process_url

def process_url
self.homepage = self.homepage.slice(0, self.homepage.length - 1) if self.homepage.present? && self.homepage.ends_with?("/")
end

This is called before the validations kick in, and will make sure that if the homepage attribute is present (even if you add a presence validation later if it becomes non-optional, remember this is running before validations), then any trailing / is removed.

Those two URL strings will then be the same after tidying up, and thus the second time around the validation will kick in and stop it from being saved.

Hope that helps!

How to check if a URL is valid

Notice:

As pointed by @CGuess, there's a bug with this issue and it's been documented for over 9 years now that validation is not the purpose of this regular expression (see https://bugs.ruby-lang.org/issues/6520).


Use the URI module distributed with Ruby:

require 'uri'

if url =~ URI::regexp
# Correct URL
end

Like Alexander Günther said in the comments, it checks if a string contains a URL.

To check if the string is a URL, use:

url =~ /\A#{URI::regexp}\z/

If you only want to check for web URLs (http or https), use this:

url =~ /\A#{URI::regexp(['http', 'https'])}\z/

how to check if a url/link is safe in ruby on rails

Here is what I found so far, hope this help someone (that have the same requirement):

As pointed by @debugger there are multiple services that provide these functionalities, the best fit in my case are the ones below:

Safe browsing google API no commercial purposes

Web Risk for commercial purposes

The above are google APIs that can be used to check if a URL is safe or not.

In the last case you will be charged after a certain amount of request so maybe is a good idea to check if the URL/link is valid: valid URL gem

Custom URL validation method in Rails 3.2

If you just want to get it working, then how about:

errors.add(:url, 'not valid') if (url =~ URI::regexp).nil?

But if parsing urls is important to you, you might want to consider an alternative to ruby's standard URI implementation such as addressable, which handles UTF-8 characters, normalization and other edge cases that might be important depending on the context.

See also this: check if url is valid ruby

rails validating url with URI.parse throwing unexpected results

Your issue is actually a Ruby gotcha, on this line:

uri = URI.parse(uri)

What you wanted to happen was for uri to be redefined as the parsed version of uri. What's actually happening is that Ruby is seeing "oh, you're defining a new local variable uri"; that new local variable now takes precedence over the method uri which you defined with your attr_accessor. And for some reason, Ruby evaluates self-references during local variable assignment as nil. This is an example of shadowing.

So, the statement above causes URI.parse to always execute on a value of nil, which is why your test is failing no matter what you set that URI to. To fix it, just use a different variable name:

parsed_uri = URI.parse(uri)

Appendix: short examples that prove I'm not making this up

irb(main):016:0> z
NameError: undefined local variable or method `z' for main:Object
from (irb):16
from /Users/rnubel/.rubies/ruby-2.3.3/bin/irb:11:in `<main>'
irb(main):017:0> z = z
=> nil

The first statement fails, because it just references a local var that doesn't exist. The second statement succeeds, because Ruby evaluates the z variable in the assignment of z as nil.

irb(main):011:0> class Foo; def bar; "http://www.google.com"; end; end
=> :bar
irb(main):012:0> Foo.new.instance_exec { bar }
=> "http://www.google.com"
irb(main):013:0> Foo.new.instance_exec { bar = (puts bar.inspect) }
nil
=> nil

This is the same issue you're having; just with a puts to inspect the value in-line. It prints nil.



Related Topics



Leave a reply



Submit