1

Hopefully someone can help, I'm trying to use a regular expression to extract something from a string that occurs after a pattern, but it's not working and I'm not sure why. The regex works fine in linux...

import re
s = "GeneID:5408878;gbkey=CDS;product=carboxynorspermidinedecarboxylase;protein_id=YP_001405731.1"
>>> x = re.search(r'(?<=protein_id=)[^;]*',s)
>>> print(x)
<_sre.SRE_Match object at 0x000000000345B7E8>
2
  • If the leading GeneID were stripped out, you could create a dict of the key/value pairs and avoid using regular expressions. dict(fragment.split("=") for fragment in s.split(';')) Commented Jul 7, 2013 at 12:25
  • Did you read the documentation on how to use re? Commented Jul 7, 2013 at 12:31

2 Answers 2

7

Use .group() on the search result to print the captured groups:

>>> print(x.group(0))
YP_001405731.1

As Martijn has had pointed out, you created a match object. The regular expression is correct. If it was wrong, print(x) would have printed None.

Sign up to request clarification or add additional context in comments.

2 Comments

That's what I get for moving out of connectivity.
@MartijnPieters I gave you a mention :). I still respect you as a great pythonic master :D
4

You should probably think about re-writing your regex so that you find all pairs so you don't have to muck around with specific groups and hard-coded look behinds...

import re
kv = dict(re.findall('(\w+)=([^;]+)', s))
# {'gbkey': 'CDS', 'product': 'carboxynorspermidinedecarboxylase', 'protein_id': 'YP_001405731.1'}
print kv['protein_id']
# YP_001405731.1

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.