2

I'm currently having some issues with a regex to extract a URL.

I want my regex to take URLS such as:

http://stackoverflow.com/questions/ask
https://stackoverflow.com
http://local:1000
https://local:1000

Through some tutorials, I've learned that this regex will find all the above: ^(http|https)\://.*$ however, it will also take http://local:1000;http://invalid http://khttp://as a single string when it shouldn't take it at all.

I understand that my expression isn't written to exclude this, but my issue is I cannot think of how to write it so it checks for this scenario.

Any help is greatly appreciated!

Edit:

Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement?

Sorry this will be done with Java

I also need to add the following constraint: a string such as http://local:80/test:90 fails because of the duplicate of port...aka I need to have a constraint that only allows two total : symbols in a valid string (one after http/s) and one before port.

5
  • You want to extract the url without the protocol? Commented Jan 28, 2013 at 18:58
  • Hi, if the string contains multiple urls such as http://k.http://blah it shouldn't be found as valid in my regex Commented Jan 28, 2013 at 19:10
  • Yes, as long as it's not another URL it's fine. Commented Jan 28, 2013 at 19:19
  • Looking at my issue, it seems that I could eliminate my issue as long as I can implement a check to make sure '//' doesn't occur in my string after the initial http:// or https://, any ideas on how to implement? Commented Jan 28, 2013 at 19:23
  • 1
    Please read the [regex] tag's description: "Please also include a tag specifying the programming language or tool you are using." Commented Jan 28, 2013 at 19:25

4 Answers 4

2

This will only produce a match if if there is no :// after its first appearance in the string.

^https?:\/\/(?!.*:\/\/)\S+

Note that trying to parse a valid url from within a string is very complex, see
In search of the perfect URL validation regex, so the above does not attempt to do that.
It will just match the protocol and following non-space characters.

In Java

Pattern reg = Pattern.compile("^https?:\\/\\/(?!.*:\\/\\/)\\S+");
Matcher m = reg.matcher("http://somesite.com"); 
if (m.find()) {
    System.out.println(m.group());
} else {
    System.out.println("No match");
}
Sign up to request clarification or add additional context in comments.

4 Comments

Seems like this is what I need, any idea how to do this in java?
@Greg. Yes, that's great, but it assumes that you have already got the url.
Mike- Thank you, this works great. One question, if I wanted to add into the contraints that a second colon in the string also makes it invalid (Ex: "local:800/test:5") how would I go about doing that?
@user2019260. If you mean a third colon, you could use ^https?:\\/\\/(?!.*:(.*:|\\/\\/))\\S+ This will disallow :// or two : in the string after http://.
1

Check your programming language to see if it already has a parser. E.g. php has parse_url()

Comments

0

From http://net.tutsplus.com/tutorials/other/8-regular-expressions-you-should-know/

/^(https?:\/\/)?([\da-z\.-]+)\.([a-z\.]{2,6})([\/\w \.-]*)*\/?$/

This may change based on the programming language/tool

Comments

0
/[A-Za-z]+://[A-Za-z0-9-]+.[A-Za-z0-9-:%&;?#/.=]+/g

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.