Suggestions for better code in python way of searching a string using regex expression

Question

I have a block of code for searching a specific block of address and formatting the results in a certain way.

e.g, I have an input string "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000". I need to extract addresses between "BEG" and "END" which are "701D135D" and "702D72FC" in this case and format them in the following fashion:

[0]0x701D135D  
[1]0x702D72FC

I wrote a script for that purpose:

import re
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--address', help='Parse the input addresses')
args = parser.parse_args()
addressInfo = args.address

filter = re.compile(r'(BEG )((\w{8})\s)+(END )')
btInfo = filter.search(addressInfo)

print ("\n")
addresses = btInfo.group().split()
for idx in range(len(addresses)):
    if((addresses[idx] != 'BEG') and (addresses[idx] != 'END')):
        print ("[%d]0x%s" %(idx-1, addresses[idx]))

When I review the code, it more like c/c++ code than python. Is there a better way the achieve the same result in the 'real python style'？

buran · Accepted Answer · 2019-06-04 06:21:14Z

3

without re, but split() and with enumerate() for indexes:

def get_addresses(input_string):
    for address in input_string.split(' BEG ')[-1].split(' END ')[0].split(' '):
        yield address

foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for idx, address in enumerate(get_addresses(foo)):
    print(f'[{idx}]0x{address}')

using f-strings requires 3.6+

answered Jun 4, 2019 at 6:21

buran

14.4k13 gold badges45 silver badges76 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

r0n9 Over a year ago

This is really cool, thanks. But I got a question. Is there performance for using co-routine to yield over using a for loop to get a single element one by one from the split results? e.g using for index, address in enumerate(addressInfo.split("BEG ")[-1].split(" END")[0].split()): instead of using def get_address(input_string)

buran Over a year ago

in this particular case (assuming you will not have many addresses) there is no practical difference. In general case split() will produce list in memory, while get_addresses is generator and it will not produce the whole list in the memory. In addition it makes the code more structured and allows to test the generator function separately. You can read more about generators here wiki.python.org/moin/Generators

buran Over a year ago

Actually, I was corrected (outside SO) that str.split() will create the list in memory in any case, so this particular part of my previous comment is nor correct. still the code is better structured and easy to test.

funie200 · Accepted Answer · 2019-06-04 12:41:44Z

1

How about this:

import re
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--address', help='Parse the input addresses')
addressInfo = parser.parse_args().address

btInfo = re.search(r' BEG (.*?) END ', addressInfo).group(1)
print("\n")

for index, address in enumerate(btInfo.split()):
    print("[{0}]0x{1}".format(index, address))

r'BEG (.*?) END ' will get everything between BEG and END. And by using eumerate() in the for-loop, you can loop through the split string and keep track of the index at the same time.

The code will give the following output:

[0]0x701D135D  
[1]0x702D72FC

edited Jun 4, 2019 at 12:41

answered Jun 4, 2019 at 6:25

funie200

3,9555 gold badges25 silver badges38 bronze badges

Comments

Rakesh · Accepted Answer · 2019-06-04 06:25:15Z

0

Using re.search with Lookbehind & Lookahead and grouping.

Ex

import re

s = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
m = re.search(r"(?<=\bBEG\b)(?P<address_1>.+) (?P<address_2>.+)(?=\bEND\b)", s)
if m:
    print(m.group("address_1").strip())
    print(m.group("address_2").strip())

Output:

701D135D
702D72FC

edited Jun 4, 2019 at 6:25

answered Jun 4, 2019 at 6:13

Rakesh

82.9k17 gold badges86 silver badges122 bronze badges

5 Comments

r0n9 Over a year ago

The reason that using the re.findall with Lookbehind & Lookahead better than using re.search is?

Rakesh Over a year ago

You do not need to iterate your data to fetch all the values.

Rakesh Over a year ago

Is you data just one line? "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000".? or do you have multiple lines?

r0n9 Over a year ago

The input is a one line string

Rakesh Over a year ago

Updated snippet.

Collectives™ on Stack Overflow

Suggestions for better code in python way of searching a string using regex expression

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

3 Comments

Comments

5 Comments

Your Answer

Sign up or log in

Post as a guest

Related