0

I want to split a column in pandas dataframe and I am using this code:

df['entry'] = df['entry'].str.split('.')

Now the problem is that I want to split bigger text elements such as:

I am content. I am another content.

But in the data there is also stuff like this:

I am 10.2 content.

I don't want to split the numbers. So I would need some conditional such as:

If dot between numbers, don't split.

How can I do this with pandas?

1 Answer 1

2

Use negative lookarround:

Update to deal with " I am St. Content."

rx = re.compile(r'(?<!\d)(?<!\b\w\w)\.(?!\d)')
str = 'I am content. I am another content. I am 10.2 content. I am St. Content.'
str = rx.split(str)
print(str)

Output:

['I am content', ' I am another content', ' I am 10.2 content', ' I am St. Content', '']
Sign up to request clarification or add additional context in comments.

2 Comments

If I have another exception such as: I am St. Content. And it is always the two letters St where I don't want to split. Can I add a second lockaround?
Unfortunately, your solution looked promising but doesn't work on my data.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.