0

I wish to get all the the domain names in the given string using python. i have tried the below but i am not getting the o/p as expected

str = "ctcO6OgnWRAxLtu+akRCFwM asu.edu zOiV6Wo6nDnUhQkZO4XTySrTRwLMgozM9R/LyQs2r+Pb tarantino.cs.ucsb.edu,128.111.48.123 ssh-rsa 9SMF4U+qJW03Bh1"
list = re.findall(r'([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*\.)+[a-z]{2,10}', str)
print list

I want the output as:

asu.edu , tarantino.cs.ucsb.edu  

but what I get is:

[('asu.', ''), ('ucsb.', '')]

What am I missing ?

1
  • 1
    please don't overwrite built-in types, use my_str and my_list names instead if you don't have any meaningful names for them Commented Feb 6, 2016 at 21:29

2 Answers 2

1

This should work:

import re
my_str = "ctcO6OgnWRAxLtu+akRCFwM asu.edu zOiV6Wo6nDnUhQkZO4XTySrTRwLMgozM9R/LyQs2r+Pb tarantino.cs.ucsb.edu,128.111.48.123 ssh-rsa 9SMF4U+qJW03Bh1"
my_list = re.findall(r'(([a-zA-Z0-9]+(-[a-zA-Z0-9]+)*\.)+[a-z]{2,10})', my_str)
print [i[0] for i in my_list]

As Gavin pointed out, you shouldn't use str and list as variable names because they are built-in types in Python.

Sign up to request clarification or add additional context in comments.

1 Comment

you can just use non-capturing groups r'(?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*\.)+[a-z]{2,10}' and print my_list
0
In [63]: text = "ctcO6OgnWRAxLtu+akRCFwM asu.edu zOiV6Wo6nDnUhQkZO4XTySrTRwLMgozM9R/LyQs2r+Pb tarantino.cs.ucsb.edu,128.111.48.123 ssh-rsa 9SMF4U+qJW03Bh1"

In [64]: re.findall(r'(?:[a-zA-Z0-9]+\.)+[a-z]{2,10}', text)
Out[64]: ['asu.edu', 'tarantino.cs.ucsb.edu']
  • Use (?:...) to create a non-capturing group. When the pattern contains more than one grouping pattern (i.e. a pattern surrounded by parentheses), re.findall returns a tuple for each match. To prevent re.findall from returning a list of tuples, use non-capturing groups.

  • For the text you posted, the pattern (-[a-zA-Z0-9]+)*\. is unnecessary. There is no literal - in text so (-[a-zA-Z0-9]+)* never matches anything in text. Of course, you could add (?:-[a-zA-Z0-9]+)* to the pattern if you wish (note the use of the non-capturing group (?:...)), but that part of the pattern is not exercised by the text you posted. It would allow you to match names with hypthens, however:

    In [73]: re.findall(r'(?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*\.)+[a-z]{2,10}', 'asu-psu.edu but not initial hyphens like -psu-asu.edu')
    Out[73]: ['asu-psu.edu', 'psu-asu.edu']
    

    And as Aprillion noted:

    In [74]: re.findall(r'(?:[a-zA-Z0-9]+(?:-[a-zA-Z0-9]+)*\.)+[a-z]{2,10}', text)
    Out[74]: ['asu.edu', 'tarantino.cs.ucsb.edu']
    
  • See regex101 for an explanation of the pattern (?:[a-zA-Z0-9]+\.)+[a-z]{2,10}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.