1

I'm trying to use regular expressions to remove specific key codes that are tied to the name of a genre in my dataset. However, what I have so far is getting rid of most of the key-codes but leaving behind some letters and I am not sure why. Upon inspection it seems to mostly be having trouble where there is a 0 with letters following it, for example "/m/0lxr" leaves behind lxr.

If anyone out there knows how I would go about to fix this, please let me know!

This is the code I have so far.

def prepare(self, word): 
    word = re.sub(r'//', "", word)
    word = re.sub(r'/\u[0-9][a-z]', "", word)

    word = re.sub(r'/.', "", word) 
    word = re.sub(r'/,', "", word) 
    word = re.sub(r'/!', "", word) 
    word = re.sub(r'/?', "", word) 
    word = re.sub(r'/{', "", word)

    word = re.sub(r"'", "", word)
    word = re.sub(r"//m//[0-9][a-z]+", "", word) 
    word = re.sub(r'[0-9][a-z]+', "", word)
    word = re.sub(r'[a-z][0-9]+', "", word)

    return word
11
  • 2
    What is your input and desired output? Commented Nov 1, 2017 at 14:21
  • Ok, for my input it would be taking in something like for instance "{"/m/0lsxr":"Crime Fiction"}" and would desire that the output be "Crime Fiction", but at the moment the output is displaying "lsxr Crime Fiction". Just looking for a way to remove that lsxr bit. Commented Nov 1, 2017 at 14:24
  • Why not use (?<=:")[^"]*(?=")? Commented Nov 1, 2017 at 14:24
  • @ctwheels thanks for your response. Can I just ask for clarity how I would implement that into my code? Would it be similar to something like word = re.sub((?<=:")[^"]*(?="),word) ? I am a bit confused :) Commented Nov 1, 2017 at 14:29
  • Something like re.sub(r'(?<=:")[^"]*(?=")', "", word) Commented Nov 1, 2017 at 14:31

2 Answers 2

1

You can use ast.literal_eval:

import ast
s = '{"/m/0lsxr":"Crime Fiction"}'
final_output = ast.literal_eval(s).values()
print(final_output)

Output:

['Crime Fiction']
Sign up to request clarification or add additional context in comments.

Comments

0

Try this

word="/m/0lsxr:Crime Fiction"
re.sub(r'.*:(\w*)',r'\1',word)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.