use regex to replace long string

Question

I want to replace the string

ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg

with

ID12345678

How can I replace this via regex?

I tried this - it didn't work.

import re
re.sub(r'_\w+_\d_\d+_\w+','')

Thank you

re.sub(r'_\w+_\d_\d+_\w+','') won't do anything at all -- you need 3 arguments -- and if you want re.match, the second argument should be your long input string, not an empty string. Also, if s is your long input string, a naive solution would be simply s[0:10] or s[0:s.find('_')]. — jedwards
– jedwards, Commented Mar 7, 2015 at 23:22
if you just want the number and its the beginning of every string just use find like @jedwards showed — letsc
– letsc, Commented Mar 7, 2015 at 23:24
I think you actually want re.match (if the string is guaranteed to be at the start of the input) or re.search otherwise, not re.sub -- provided, at least, you're not trying to do in-line replacement in a much longer string. — jedwards
– jedwards, Commented Mar 7, 2015 at 23:26

Kasravnd · Accepted Answer · 2015-03-07 23:34:42Z

1

You can use re.sub with pattern [^_]* that match any sub-string from your text that not contain _ and as re.sub replace the pattern for first match you can use it in this case :

>>> s="ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> import re
>>> re.sub(r'([^_]*).*',r'\1',s)
'ID12345678'

But if it could be appear any where in your string you can use re.search as following :

>>> re.search(r'ID\d+',s).group(0)
'ID12345678'
>>> s="_S3_MPRAGE_ADNI_ID12345678_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
>>> re.search(r'ID\d+',s).group(0)
'ID12345678'

But without regex simply you can use split() :

>>> s.split('_',1)[0]
'ID12345678'

edited Mar 7, 2015 at 23:34

answered Mar 7, 2015 at 23:23

Kasravnd

108k19 gold badges167 silver badges195 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

Annamarie Over a year ago

Thank you - the first option works very. But since my very long ID is in a pandas-dataframe read from a csv-file it doesn't work. Do I have to add something in this case such as \b for the boundary?

Annamarie Over a year ago

One more question - the \1 refers to the first capturing group right? So how does it know which characters separates capturing groups? Thank you.

Annamarie Over a year ago

I found it out - if a regular expression is used in pandas dataframe.replace() regex=True has to be set.

Kasravnd Over a year ago

@Swankyleo glad that hear that! As im not familiar with pandas! ;)

Diego Torres Milano · Accepted Answer · 2015-03-07 23:25:49Z

0

I guess the first part is variable, then

import re
s = "ID12345678_S3_MPRAGE_ADNI_32Ch_2_98_clone_transform_clone_reg_N3Corrected1_mask_cp_strip_durastripped_N3Corrected_clone_lToads_lesions_seg"
print re.sub(r'_.*$', r'', s)

answered Mar 7, 2015 at 23:25

Diego Torres Milano

69.9k9 gold badges116 silver badges145 bronze badges

Collectives™ on Stack Overflow

use regex to replace long string

2 Answers 2

4 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

4 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related