0

I want to replace the string

ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg

with

ID12345678

How can I replace this via regex?

I tried this - it didn't work.

import re
re.sub(r'_\w+_\d_\d+_\w+','')

Thank you

3
  • 1
    re.sub(r'_\w+_\d_\d+_\w+','') won't do anything at all -- you need 3 arguments -- and if you want re.match, the second argument should be your long input string, not an empty string. Also, if s is your long input string, a naive solution would be simply s[0:10] or s[0:s.find('_')]. Commented Mar 7, 2015 at 23:22
  • if you just want the number and its the beginning of every string just use find like @jedwards showed Commented Mar 7, 2015 at 23:24
  • I think you actually want re.match (if the string is guaranteed to be at the start of the input) or re.search otherwise, not re.sub -- provided, at least, you're not trying to do in-line replacement in a much longer string. Commented Mar 7, 2015 at 23:26

2 Answers 2

1

You can use re.sub with pattern [^_]* that match any sub-string from your text that not contain _ and as re.sub replace the pattern for first match you can use it in this case :

>>> s="ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> import re
>>> re.sub(r'([^_]*).*',r'\1',s)
'ID12345678'

But if it could be appear any where in your string you can use re.search as following :

>>> re.search(r'ID\d+',s).group(0)
'ID12345678'
>>> s="_S3_MPRAGE_ADNI_ID12345678_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> re.search(r'ID\d+',s).group(0)
'ID12345678'

But without regex simply you can use split() :

>>> s.split('_',1)[0]
'ID12345678' 
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you - the first option works very. But since my very long ID is in a pandas-dataframe read from a csv-file it doesn't work. Do I have to add something in this case such as \b for the boundary?
One more question - the \1 refers to the first capturing group right? So how does it know which characters separates capturing groups? Thank you.
I found it out - if a regular expression is used in pandas dataframe.replace() regex=True has to be set.
@Swankyleo glad that hear that! As im not familiar with pandas! ;)
0

I guess the first part is variable, then

import re
s = "ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
print re.sub(r'_.*$', r'', s)

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.