Python regex to get everything until the first dot in a string

Question

find = re.compile("^(.*)\..*")
for l in lines:
    m = re.match(find, l)
    print m.group(1)

I want to regex whatever in a string until the first dot.

in [email protected], I want a@b
in [email protected], I want a@b
in [email protected], I want a@b

What my code is giving me...

[email protected] prints a@b
[email protected] prints [email protected]
[email protected] prints [email protected]

what should find be so that it only gets a@b?

What should happen if there is no dot? Do you want the whole string or the empty string as match(or something else)? — Bakuriu
– Bakuriu, Commented Oct 2, 2013 at 16:41

Rohit Jain · Accepted Answer · 2013-10-02 16:54:43Z

63

By default all the quantifiers are greedy in nature. In the sense, they will try to consume as much string as they can. You can make them reluctant by appending a ? after them:

find = re.compile(r"^(.*?)\..*")

As noted in comment, this approach would fail if there is no period in your string. So, it depends upon how you want it to behave. But if you want to get the complete string in that case, then you can use a negated character class:

find = re.compile(r"^([^.]*).*")

it will automatically stop after encountering the first period, or at the end of the string.

Also you don't want to use re.match() there. re.search() should be just fine. You can modify your code to:

find = re.compile(r"^[^.]*")

for l in lines:
    print re.search(find, l).group(0)

Demo on ideone

edited Oct 2, 2013 at 16:54

answered Oct 2, 2013 at 16:33

Rohit Jain

214k45 gold badges419 silver badges534 bronze badges

Sign up to request clarification or add additional context in comments.

2 Comments

Bakuriu Over a year ago

Note that the two regexes do not match the same thing. The first one fails if there is no dot, while the latter always matches.

Rohit Jain Over a year ago

@Bakuriu Good catch. Missed it somehow :)

Jerry · Accepted Answer · 2014-03-18 09:23:24Z

45

You can use .find() instead of regex in this situation:

>>> s = "[email protected]"
>>> print(s[0:s.find('.')])
a@b

Considering the comments, here's some modification using .index() (it's similar to .find() except that it returns an error when there's no matched string instead of -1):

>>> s = "[email protected]"
>>> try:
...     index = s.index('.')
... except ValueError:
...     index = len(s)
...
>>> print(s[:index])
a@b

edited Mar 18, 2014 at 9:23

answered Oct 2, 2013 at 16:39

Jerry

71.8k14 gold badges106 silver badges148 bronze badges

2 Comments

user2555451 Over a year ago

+1 - This is an excellent and efficient solution that doesn't require importing. Nice!

Bakuriu Over a year ago

This gives an odd result if there is no dot: it returns s without the last character. If this matter it's probably simpler to do try: index = s.index('.') except ValueError: index = len(s)

Escualo · Accepted Answer · 2013-10-02 16:51:53Z

You can use the split method: split the string at the . character one time, and you will get a tuple of (before the first period, after the first period). The notation would be:

mystring.split(".", 1)

Then you can simply create a generator that "yields" the part you are interested, and ignores the one you are not (the _ notation). It works as follows:

entries = [
    "[email protected]",
    "[email protected]",
    "[email protected]",
    ]

for token, _ in (entry.split(".", 1) for entry in entries):
    print token

Output:

a@b
a@b
a@b

The documentation for the split method can be found online:

str.split([sep[, maxsplit]])

Return a list of the words in the string, using sep as the delimiter string. If maxsplit is given, at most maxsplit splits are done (thus, the list will have at most maxsplit+1 elements). If maxsplit is not specified or -1, then there is no limit on the number of splits (all possible splits are made).

kindall · Accepted Answer · 2013-10-02 16:51:34Z

2

I recommend partition or split in this case; they work well when there is no dot.

text = "[email protected]"

print text.partition(".")[0]
print text.split(".", 1)[0]

answered Oct 2, 2013 at 16:51

kindall

185k36 gold badges291 silver badges321 bronze badges

Comments

Srinivasreddy Jakkireddy · Accepted Answer · 2013-10-02 16:59:12Z

1

import re
data='[email protected]'
re.sub('\..*','',data)

answered Oct 2, 2013 at 16:59

Srinivasreddy Jakkireddy

2,9191 gold badge14 silver badges7 bronze badges

Collectives™ on Stack Overflow

Python regex to get everything until the first dot in a string

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

2 Comments

2 Comments

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related