1

I'm trying to turn reformat my data here:

gi|492845765|ref|WP_005999719.1| DNA methyltransferase [[Eubacterium] infirmum]

into

[[Eubacterium]infirmum]gi|492845765|

That is, I just want to just keep the gi number and the organism name (with the organism name in front of the gi number), and get rid of the "extra" information (in this case, ref number and "DNA methyltransferase").

I would do re.sub(r"(\w+ |\w + |) \w+|\w_\w|\s\w+\s\w\s ([.]), \2\1, line)

(or something remotely like that)

However, some other lines of my data have more than two words in the "extra" information. example:

gi|548229945|ref|WP_022448665.1| dNA (Cytosine-5-)-methyltransferase [Roseburia sp. CAG:303]

How would I write a regex expression to rename all of my data so that the organism name is in front, the gi numbers next, and everything else deleted?

0

1 Answer 1

2

This would probably do what you're asking:

(\w+\|\d+\|)(?:.*\s)(\[\S*)(?:\s)(.+\])

Using \2\3\1 as the replace pattern, $2$3$1 seems to work the same.

re.sub(r'(\w+\|\d+\|)(?:.*\s)(\[\S*)(?:\s)(.+\])', \2\3\1, line)

example: http://regex101.com/r/aP6lB9

Sign up to request clarification or add additional context in comments.

5 Comments

thank you very much! How would I get rid of the white space within the brackets in the 2nd group?
Ah, yes - I've updated the answer, which should take care of the spaces. :)
this one only captures the [ bracket and nothing after it. i've tried to put ([\w*)|(?:\s)| (added /w* after the front bracket) but it only finds the first word and then stops when it hits a space.
yes, sorry about that - i was in a hurry and borked it. I've updated the answer, although i think the capture is only eliminating the first space, so it might need a little more love.
Yes, it only is getting rid of the first space. But thank you very much! The regex site is very good tool, thank you for showing this to me.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.