0

I want to extract domain name from uri.

For example, input to the regular expression may be of one of the below types

  1. test.net
  2. https://www.test.net
  3. https://test.net
  4. http://www.test.net
  5. http://test.net

in all the cases the input should return test.net

Below is the code in implemented for my purpose

    val re = "([http[s]?://[w{3}\\.]?]+)(.*)".r

But I didn't get expected result

below is my output

val re(prefix, domain) = "https://www.test.net"

prefix: String = https://www.t

domain: String = est.net

what is problem with my regular expression and how can I fix it?

8
  • 1
    The dot after 'www' should be escaped. Also, you have square brackets around the whole thing before the plus sign Commented Dec 13, 2019 at 22:17
  • okay I've updated it still the same error Commented Dec 13, 2019 at 22:21
  • And you're still using square brackets where you should use parentheses. The square brackets only match 1 of those chars, while parens match the entire group. I don't understand your regex but this should at least get you a bit further: "(http(s)?://(w{3}\\.)+?)([^.]*)" Commented Dec 13, 2019 at 22:32
  • still same error for your regular expression above val re(prefix, domain) = "https://www.test.net" prefix: String = https://www.t domain: String = est.net Commented Dec 13, 2019 at 22:36
  • So your domain name is just everything after "www." right? yes Commented Dec 13, 2019 at 22:40

1 Answer 1

3

what is problem with my regular expression and how can I fix it?

You are using a character class

[http.?://(www.)?]

This means:

  • either an h
  • or a t
  • or a t
  • or a .
  • or a ?
  • or a :
  • or a /
  • or a /
  • or a (
  • or a w
  • or a w
  • or a w
  • or a .
  • or a )
  • or a ?

It does not include an s, so it will not match https://.

It is not clear to me why you are using a character class here, nor why you are using duplicate characters in the class.

Ideally, you shouldn't try to parse URIs yourself; someone else has already done the hard work. You could, for example, use the java.net.URI class:

import java.net.URI

val u1 = new URI("test.net")
u1.getHost
// res: String = null

val u2 = new URI("https://www.test.net")
u2.getHost
// res: String = www.test.net

val u3 = new URI("https://test.net")
u3.getHost
// res: String = test.net

val u4 = new URI("http://www.test.net")
u4.getHost
// res: String = www.test.net

val u5 = new URI("http://test.net")
u5.getHost
// res: String = test.net

Unfortunately, as you can see, what you want to achieve does not actually comply with the official URI syntax.

If you can fix that, then you can use java.net.URI. Otherwise, you will need to go back to your old solution and parse the URI yourself:

val re = "(?>https?://)?(?>www.)?([^/?#]*)".r

val re(domain1) = "test.net"
//=> domain1: String = test.net

val re(domain2) = "https://www.test.net"
//=> domain2: String = test.net

val re(domain3) = "https://test.net"
//=> domain3: String = test.net

val re(domain4) = "http://www.test.net"
//=> domain4: String = test.net

val re(domain5) = "http://test.net"
//=> domain5: String = test.net
Sign up to request clarification or add additional context in comments.

3 Comments

Except the first case (which is just two strings with a . between them) all others can be acquired using URI + a check to remove the beginning www.. this regex will match "hello. Good morning" while URI will not allow that.
The problem is that the OP expects in all cases the domain part of the host part of the URI to be test.net. However, that is actually only true for cases #3 and #5, where the host is www and the domain is indeed test.net. In case #2 and #4, the FQDN of the host part is just net, and in case #1, the URI doesn't even have a host part at all, it only has a path. So, trying to parse this with an URI parser doesn't work, because the OP's parsing rules are different from RFC 2396.
Since the OP's parsing rules do not follow any official specification, and the OP haven't given their parsing rules, who is to say that "hello. Good morning" isn't a valid URI according to their rules?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.