0

I have a long text like the one below. I need to split based on some words say ("In","On","These")

Below is sample data:

On the other hand, we denounce with righteous indignation and dislike men who are so beguiled and demoralized by the charms of pleasure of the moment, so blinded by desire, that they cannot foresee the pain and trouble that are bound to ensue; and equal blame belongs to those who fail in their duty through weakness of will, which is the same as saying through shrinking from toil and pain. These cases are perfectly simple and easy to distinguish. In a free hour, when our power of choice is untrammelled and when nothing prevents our being able to do what we like best, every pleasure is to be welcomed and every pain avoided. But in certain circumstances and owing to the claims of duty or the obligations of business it will frequently occur that pleasures have to be repudiated and annoyances accepted. The wise man therefore always holds in these matters to this principle of selection: he rejects pleasures to secure other greater pleasures, or else he endures pains to avoid worse pains.

Can this problem be solved with a code as I have 1000 rows in a csv file.

6
  • 6
    Yes this problem can for sure be solved with python code Commented Mar 30, 2020 at 12:27
  • 1
    Something like: re.split(r'(?<!^)\b(?=(?:On|In|These)\b)', YourStringVariable) would work in your case. Commented Mar 30, 2020 at 12:33
  • You can start with str.split Commented Mar 30, 2020 at 12:34
  • Your sample data curiously enough does not look anything like csv. Commented Mar 30, 2020 at 12:48
  • Thanks @JVD the code worked well Commented Mar 30, 2020 at 18:52

3 Answers 3

1

As per my comment, I think a good option would be to use regular expression with the pattern:

 re.split(r'(?<!^)\b(?=(?:On|In|These)\b)', YourStringVariable)
Sign up to request clarification or add additional context in comments.

Comments

0

Yes this can be done in python. You can load the text into a variable and use the built in Split function for string. For example:

with open(filename, 'r') as file:
    lines = file.read()
    lines = lines.split('These')
    # lines is now a list of strings split whenever 'These' string was encountered

Comments

0

To find whole words that are not part of larger words, I like using the regular expression: [^\w]word[^\w]

Sample python code, assuming the text is in a variable named text:

import re
exp = re.compile(r'[^\w]in[^\w]', flags=re.IGNORECASE)
all_occurrences = list(exp.finditer(text))

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.