5

I'm trying to use a regular expression to validate the format of a URL in my Rails model. I've tested the regex in Rubular with the URL http://trentscott.com and it matched.

Any idea why it fails validation when I test it in my Rails app (it says "name is invalid").

Code:

  url_regex = /^((http|https):\/\/)?[a-z0-9]+([-.]{1}[a-z0-9]+).[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

  validates :serial, :presence => true
  validates :name, :presence => true,
                   :format    => {  :with => url_regex  }
2
  • A pedantic note, the question actually asks to check a URL not a domain name, the domain name is trentscott.com. Commented Jan 9, 2019 at 7:26
  • Edited and fixed. Hopefully google will reindex it. Commented Oct 11, 2019 at 4:19

5 Answers 5

14

You don't need to use a regexp here. Ruby has a much more reliable way to do that:

# Use the URI module distributed with Ruby:

require 'uri'

unless (url =~ URI::regexp).nil?
    # Correct URL
end

(this answer comes from this post:)

Sign up to request clarification or add additional context in comments.

1 Comment

"https://foo;bar.com" =~ URI::regexp yields a successful match. So this isn't very useful.
10

(I like Thomas Hupkens' answer, but for other people viewing, I'll recommend Addressable)

It's not recommended to use regex to validate URLs.

Use Ruby's URI library or a replacement like Addressable, both of which making URL validation trivial. Unlike URI, Addressable can also handle international characters and tlds.

Example Usage:

require 'addressable/uri'

Addressable::URI.parse("кц.рф") # Works

uri = Addressable::URI.parse("http://example.com/path/to/resource/")
uri.scheme
#=> "http"
uri.host
#=> "example.com"
uri.path
#=> "/path/to/resource/"

And you could build a custom validation like:

class Example
  include ActiveModel::Validations

  ##
  # Validates a URL
  #
  # If the URI library can parse the value, and the scheme is valid
  # then we assume the url is valid
  #
  class UrlValidator < ActiveModel::EachValidator
    def validate_each(record, attribute, value)
      begin
        uri = Addressable::URI.parse(value)

        if !["http","https","ftp"].include?(uri.scheme)
          raise Addressable::URI::InvalidURIError
        end
      rescue Addressable::URI::InvalidURIError
        record.errors[attribute] << "Invalid URL"
      end
    end
  end

  validates :field, :url => true
end

Code Source

2 Comments

after looking at addressable, I think it wins hands down, thanks
+1 for addressable BUT don't assume that it will raise any exceptions because it won't. Addressable::URI.parse will fail silently trying its best to figure out the URI. For example, say you want to validate an incorrect URI such as: http://thing.com. Addressable will call the scheme http and the domain http as well since it views the colon as a port delimiter. No error will be raised
7

Your input ( http://trentscott.com) does not have a subdomain but the regex is checking for one.

domain_regex = /^((http|https):\/\/)[a-z0-9]*(\.?[a-z0-9]+)\.[a-z]{2,5}(:[0-9]{1,5})?(\/.)?$/ix

Update

You also don't need the ? after ((http|https):\/\/) unless the protocol is sometimes missing. I've also escaped . because that will match any character. I'm not sure what the grouping above is for, but here is a better version that supports dashes and groups by section

domain_regex = /^((http|https):\/\/) 
(([a-z0-9-\.]*)\.)?                  
([a-z0-9-]+)\.                        
([a-z]{2,5})
(:[0-9]{1,5})?
(\/)?$/ix

4 Comments

Thanks. That fixed the error but now an entry like "abcd" is valid. Any idea on how to fix that?
The update should work. One more thing I removed was the [-.] and replaced it with \.
This does not handle international domain names, which can be represented in ASCII like: www.xn--b1akcweg3a.xn--p1ai. Yes, this gives you double dashes in your domain, which is legal, as well as top-level domains (the right-most component) that are longer than 3 characters.
@cordsen: what if I want to write a regex in Ruby for a URL which includes any non-ASCII characters or Chinese characters as well? For example, http://www.詹姆斯.com/ Can you please let me know how to figure this out?
1

Try this.

It's working for me.

/(ftp|http|https):\/\/(\w+:{0,1}\w*@)?(\S+)(:[0-9]+)?(\/|\/([\w#!:.?+=&%@!\-\/]))?/

Comments

0

This will include an international host handling as well like abc.com.it where the .it part is optional

match '/:site', to: 'controller#action' , constraints: { site: /[a-zA-Z0-9][a-zA-Z0-9-]{1,61}[a-zA-Z0-9]\.[a-zA-Z]{2,}(.[a-zA-Z]{2,63})?/}, via: :get, :format => false

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.