0

I am trying to replace variable length items in a list using regex. For example this item "HD479659" should be replaced by "HD0000000479659". I need just to insert 7 0s in between.I have made the following program but every time I run it I got the following error:"TypeError: object of type '_sre.SRE_Pattern' has no len()". Can you please help me how to solve this error.

thank you very much

Here is the program

import xlrd  
import re
import string

wb = xlrd.open_workbook("3_1.xls") 

sh = wb.sheet_by_index(0) 

outfile=open('out.txt','w')


s_pat=r"HD[1-9]{1}[0-9]{5}"

s_pat1=r"HD[0]{7}[0-9]{6}"

pat = re.compile(s_pat) 

pat1 = re.compile(s_pat1)



for rownum1 in range(sh.nrows): 

  str1= str(sh.row_values(rownum1))

  m1=[]

  m1 = pat.findall(str1)

  m1=list(set(m1))

  for a in m1:

      a=re.sub(pat,pat1,a)

  print >> outfile, m1

2 Answers 2

2

I think your solution is quite to complicated. This one should do the job and is much simpler:

import re

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, "HD479659")

See also: http://docs.python.org/library/re.html#re.sub

Update:

To transform a list of values, you have to iterate over all values. You don't have to search the matching values first:

import re

values_to_transform = [
    'HD479659',
    'HD477899',
    'HD423455',
    'does not match',
    'but does not matter'
]

def repl(match):
    return match.group(1) + ("0"*7) + match.group(2)

for value in values_to_transform:
    print re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, value)

The result is:

HD0000000479659
HD0000000477899
HD0000000423455
does not match
but does not matter
Sign up to request clarification or add additional context in comments.

7 Comments

Swanky solution. Clearly I haven't used the callback functionality for re.sub before >.< It's only going to work for fixed length things though.
re.sub in general works of course for solutions which are not fixed length. Yes, my solution is for fixed length, but that's exactly what the author of the question was asking for. So where do you see a problem?
His example was fixed length, but the statement said it was variable.*shrugs* I assume nothing as a rule, particularly when things conflict. Just replacing the '7' with '(13 - len(match.group(2)))' would work though assuming no leading 0s and the regex being right - but frankly that looks shady given the example..
Thank you very much, this is exactly what I want. But when apply this function to a list i will get this error. return _compile(pattern, 0).sub(repl, string, count)TypeError: expected string or buffer
Here is the code that I use, s_pat=r"(HD)([1-9]{1}[0-9]{5})" pat = re.compile(s_pat) def repl(match): return match.group(1) + ("0"*7) + match.group(2) for rownum1 in range(sh.nrows): str1= str(sh.row_values(rownum1)) m1=[] m1 = pat.findall(str1) m1=list(set(m1)) #print m1 for a in m1: a= re.sub(r"(HD)([1-9]{1}[0-9]{5})", repl, a) print m1
|
0

What you need to do is extract the variable length portion of the ID explicitly, then pad with 0's based on the desired length - matched length.

If I understand the pattern correctly you want to use the regex

r"HD(?P<zeroes>0*)(?P<num>\d+)"

At that point you can do

results = re.search(...bla...).groupdict()

Which returns the dict {'zeroes': '', 'num':'479659'} in this case. From there you can pad as necessary.

It's 5am at the moment or I'd have a better solution for you, but I hope this helps.

1 Comment

Firstly I would like to thank you very much. But really I couldn't understand your solution. the problem is I have a list of items with the following format"HDxxxxxx" where x is number. I need to transform it to thsi format "HD0000000xxxxxx".Just I need to put 7 0s in between.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.