Python Regular Expressions - Removing Specific Patterns

Question

I'm trying to use regular expressions to remove specific key codes that are tied to the name of a genre in my dataset. However, what I have so far is getting rid of most of the key-codes but leaving behind some letters and I am not sure why. Upon inspection it seems to mostly be having trouble where there is a 0 with letters following it, for example "/m/0lxr" leaves behind lxr.

If anyone out there knows how I would go about to fix this, please let me know!

This is the code I have so far.

def prepare(self, word): 
    word = re.sub(r'//', "", word)
    word = re.sub(r'/\u[0-9][a-z]', "", word)

    word = re.sub(r'/.', "", word) 
    word = re.sub(r'/,', "", word) 
    word = re.sub(r'/!', "", word) 
    word = re.sub(r'/?', "", word) 
    word = re.sub(r'/{', "", word)

    word = re.sub(r"'", "", word)
    word = re.sub(r"//m//[0-9][a-z]+", "", word) 
    word = re.sub(r'[0-9][a-z]+', "", word)
    word = re.sub(r'[a-z][0-9]+', "", word)

    return word

Ok, for my input it would be taking in something like for instance "{"/m/0lsxr":"Crime Fiction"}" and would desire that the output be "Crime Fiction", but at the moment the output is displaying "lsxr Crime Fiction". Just looking for a way to remove that lsxr bit. — Rachel Solomon
– Rachel Solomon, Commented Nov 1, 2017 at 14:24
@ctwheels thanks for your response. Can I just ask for clarity how I would implement that into my code? Would it be similar to something like word = re.sub((?<=:")[^"]*(?="),word) ? I am a bit confused :) — Rachel Solomon
– Rachel Solomon, Commented Nov 1, 2017 at 14:29

Ajax1234 · Accepted Answer · 2017-11-01 14:58:21Z

1

You can use ast.literal_eval:

import ast
s = '{"/m/0lsxr":"Crime Fiction"}'
final_output = ast.literal_eval(s).values()
print(final_output)

Output:

['Crime Fiction']

answered Nov 1, 2017 at 14:58

Ajax1234

71.7k9 gold badges67 silver badges110 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Sandeep Lade · Accepted Answer · 2017-11-01 14:31:53Z

0

Try this

word="/m/0lsxr:Crime Fiction"
re.sub(r'.*:(\w*)',r'\1',word)

answered Nov 1, 2017 at 14:31

Sandeep Lade

1,9432 gold badges16 silver badges25 bronze badges

Collectives™ on Stack Overflow

Python Regular Expressions - Removing Specific Patterns

2 Answers 2

Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related