pythonic way to identify and remove sub strings from strings

Question

I have a large numpy array of strings, where some elements of the array are good strings, some have special characters (typically at the start of the string and some have substrings in various quotes inside of it). I want to identify the elements which have a string inside of the string, store the string inside and remove it from my original string.

example:


my_array = ['# this is the "Sharpest" hashtag ever', 'life as we know it', '" what would you do?', 'this was an "arbitrary" result',  'what do you mean']

corrected_array = ['# this is the hashtag ever', 'life as we know it', '" what would you do?',
                   'this was an result', 'what do you mean']

As you can see the words "Sharpest" and "arbitrary" were removed from the corrected array. Is there a way where I can identify the substrings and remove them from my original string efficiently

Some of the strings inside my_array are invalid, causing a syntax error. You're going to have to fix that while you build that list. Show the code for how my_array is created. — GAEfan
– GAEfan, Commented Aug 14, 2020 at 15:38
so you want to drop every string encased between quotes? As @GAEfan says, '# this is the 'Sharpest' hashtag ever' is an invalid string, so you probably have to change the encasing quotes for the string or the substring — Juan C
– Juan C, Commented Aug 14, 2020 at 15:39
I just noticed that, the strings are valid in each element, it was a syntax error from my end when asking the question, but initial overall question stands — user14037529
– user14037529, Commented Aug 14, 2020 at 15:50
Does this answer your question? How to delete the words between two delimiters? — Juan C
– Juan C, Commented Aug 14, 2020 at 15:51

Kuldip Chaudhari · Accepted Answer · 2020-08-17 01:02:51Z

2

try this

import re
corrected_array = [re.sub('"[^"]*"', '', s.replace("'", '"')) for  s in my_array]

edited Aug 17, 2020 at 1:02

answered Aug 14, 2020 at 15:40

Kuldip Chaudhari

1,1146 silver badges8 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

v_coder12 · Accepted Answer · 2020-08-16 23:45:26Z

0

you can try a brute force approach in identifying the index associated to the first " and the similarily the last quote and then exempt all elements in the list where the first and last quotes are found

answered Aug 16, 2020 at 23:45

v_coder12

1802 silver badges10 bronze badges

Comments

Dishin Goyani · Accepted Answer · 2020-12-07 03:38:07Z

0

You can use re.sub

import re

[re.sub('["\']([^"]*)["\']', "", s) for s in my_array]
['# this is the  hashtag ever', 'life as we know it', '" what would you do?', 't
his was an  result', 'what do you mean']

edited Dec 7, 2020 at 3:38

answered Aug 14, 2020 at 15:52

Dishin Goyani

7,7533 gold badges33 silver badges42 bronze badges

Collectives™ on Stack Overflow

pythonic way to identify and remove sub strings from strings

3 Answers 3

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related