5

As the title suggests, I'm trying to retrieve the domain from a string using javascript regular expression.

Take the following strings:

String                                  ==>     Return
"google"                                ==>     null
"google.com"                            ==>     "google.com"
"www.google.com"                        ==>     "www.google.com"
"ftp://ftp.google.com"                  ==>     "ftp.google.com"
"http://www.google.com"                 ==>     "www.google.com"
"http://www.google.com/"                ==>     "www.google.com"
"https://www.google.com/"               ==>     "www.google.com"
"https://www.google.com.sg/"            ==>     "www.google.com.sg"
"https://www.google.com.sg/search/"     ==>     "www.google.com.sg"
"*://www.google.com.sg/search/"         ==>     "www.google.com.sg"

I've already read "Regex to find domain name without www - Stack Overflow" and "Extract root domain name from string - Stack Overflow" but they were too complicated so I tried writing my own regular expression:

var re = new RegExp("[\\w]+[\\.\\w]+");
/[\w]+[\.\w]+/
re.exec(document.URL);

which works fine with "google.com", "www.google.com" and "www.google.com.sg" but returns http with "http://google.com/", "http://www.google.com/" etc.

As I am new to regular expressions, I can't seem to figure out what's wrong... any ideas?

Thanks in advance!

2 Answers 2

11

Use this regex:

/(?:[\w-]+\.)+[\w-]+/

Here is a regex demo!

Sampling:

>>> var regex = /(?:[\w-]+\.)+[\w-]+/
>>> regex.exec("google.com")
... ["google.com"]
>>> regex.exec("www.google.com")
... ["www.google.com"]
>>> regex.exec("ftp://ftp.google.com")
... ["ftp.google.com"]
>>> regex.exec("http://www.google.com")
... ["www.google.com"]
>>> regex.exec("http://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com/")
... ["www.google.com"]
>>> regex.exec("https://www.google.com.sg/")
... ["www.google.com.sg"]
Sign up to request clarification or add additional context in comments.

8 Comments

Omgawd thanks! Love that regex, short n' sweet~ Although I'm still trying to figure out how it works lol... Also, >>> regex.exec("ftp://www.google.com") ... ["ftp.google.com"], how'd you get that? haha :)
just to add few bits, a domain name may also have a hyphen sign -, may you need to adjust the same.
@pushpraj and how would I add it to the regex? I'm not really that good at regex so yeah... lol
@Unihedron I still can't figure it out how u did it, care to explain how it works?
@Cheejyg What we would like to match is a aaa.bbb(.ccc.ddd.eee...) sequence. I did this by quoting the characters as [\w-]+ (any Word Character or hyphens), having another group as the characters with a dot (?:[\w-]+\.), and quantify it to allow matching more than one time. +.
|
2

You can use this regex in Javascript:

\b(?:(?:https?|ftp):\/\/)?([^\/\n]+)\/?

RegEx Demo

2 Comments

As a note for users, the regex captures the target String. See this.
This regex doesn't really work because hardcoding http, https, ftp, etc. makes it very tedious and complicated to add new schemes. e.g. "file://www.example.com/" == \b(?:(?:https?|ftp|file):\/\/)?([^\/\n]+)\/? and so on..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.