0

I have a string "A.B.C one two three."

I have a task to tokenize this string into ["A.B.C", one, two, three], neglecting the period at the end of the sentence. I'm having trouble removing the period at the end of the sentence by itself without interfering with the A.B.C acronym.

Is there a way for me to remove just periods at the end of a sentence without affecting acronyms using python regexs?

5
  • 1
    s.rstrip('.') is a quick answer, though perhaps not the best depending on exactly what you need to do. Commented Feb 5, 2014 at 2:41
  • 1
    Is there always a period at the end of the string? Commented Feb 5, 2014 at 2:42
  • 1
    Do you mean ["A.B.C", "one", "two", "three"] (note the quotes) Commented Feb 5, 2014 at 2:43
  • Yes, that was what i meant. Commented Feb 5, 2014 at 3:13
  • There isn't always a period at the end of the string. i'm tokenizing a large file with many sentences. Commented Feb 5, 2014 at 3:13

2 Answers 2

2
word = re.compile(r'[A-Za-z.]*[A-Za-z]')
word.findall("A.B.C one two three.")    # => ['A.B.C', 'one', 'two', 'three']
Sign up to request clarification or add additional context in comments.

2 Comments

You don't need to escape . inside [...] because . lose its special meaing inside [..] and match . literally.
@falsetru: (wince) yeah, I knew that ;-)
0
line= "A.B.C one two three."
print line[:-1].split(' ')

may be this way as well

4 Comments

How bout if I just want to replace the period at the end of the sentence by ''. As in, I want it to remain a string? Thanks!
The OP said there isn't always a period at the end. This assumes there is.
is this you wanted? line = line[:-1]
if we don't want to assume, we should use rstrip as you referred

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.