1

I have a block of code for searching a specific block of address and formatting the results in a certain way.

e.g, I have an input string "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000". I need to extract addresses between "BEG" and "END" which are "701D135D" and "702D72FC" in this case and format them in the following fashion:

[0]0x701D135D  
[1]0x702D72FC  

I wrote a script for that purpose:

import re
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--address', help='Parse the input addresses')
args = parser.parse_args()
addressInfo = args.address

filter = re.compile(r'(BEG )((\w{8})\s)+(END )')
btInfo = filter.search(addressInfo)

print ("\n")
addresses = btInfo.group().split()
for idx in range(len(addresses)):
    if((addresses[idx] != 'BEG') and (addresses[idx] != 'END')):
        print ("[%d]0x%s" %(idx-1, addresses[idx]))

When I review the code, it more like c/c++ code than python. Is there a better way the achieve the same result in the 'real python style'?

3 Answers 3

3

without re, but split() and with enumerate() for indexes:

def get_addresses(input_string):
    for address in input_string.split(' BEG ')[-1].split(' END ')[0].split(' '):
        yield address

foo = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
for idx, address in enumerate(get_addresses(foo)):
    print(f'[{idx}]0x{address}')
  • using f-strings requires 3.6+
Sign up to request clarification or add additional context in comments.

3 Comments

This is really cool, thanks. But I got a question. Is there performance for using co-routine to yield over using a for loop to get a single element one by one from the split results? e.g using for index, address in enumerate(addressInfo.split("BEG ")[-1].split(" END")[0].split()): instead of using def get_address(input_string)
in this particular case (assuming you will not have many addresses) there is no practical difference. In general case split() will produce list in memory, while get_addresses is generator and it will not produce the whole list in the memory. In addition it makes the code more structured and allows to test the generator function separately. You can read more about generators here wiki.python.org/moin/Generators
Actually, I was corrected (outside SO) that str.split() will create the list in memory in any case, so this particular part of my previous comment is nor correct. still the code is better structured and easy to test.
1

How about this:

import re
import argparse

parser = argparse.ArgumentParser()
parser.add_argument('--address', help='Parse the input addresses')
addressInfo = parser.parse_args().address

btInfo = re.search(r' BEG (.*?) END ', addressInfo).group(1)
print("\n")

for index, address in enumerate(btInfo.split()):
    print("[{0}]0x{1}".format(index, address))

r'BEG (.*?) END ' will get everything between BEG and END. And by using eumerate() in the for-loop, you can loop through the split string and keep track of the index at the same time.

The code will give the following output:

[0]0x701D135D  
[1]0x702D72FC

Comments

0

Using re.search with Lookbehind & Lookahead and grouping.

Ex

import re

s = "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000"
m = re.search(r"(?<=\bBEG\b)(?P<address_1>.+) (?P<address_2>.+)(?=\bEND\b)", s)
if m:
    print(m.group("address_1").strip())
    print(m.group("address_2").strip())

Output:

701D135D
702D72FC

5 Comments

The reason that using the re.findall with Lookbehind & Lookahead better than using re.search is?
You do not need to iterate your data to fetch all the values.
Is you data just one line? "70D76320 BEG 701D135D 702D72FC END EAR0 00000000 0000000".? or do you have multiple lines?
The input is a one line string
Updated snippet.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.