I'm trying to parse two types of one-line address strings:
Flat XXX, XXX <Building name>, <City/town>, <State> <Postcode>
DDD <Generic place name>, <Road name> road, <City/town>, <State>
using using the following regex
re.search(r'(Flat \w+)?\W*(.+)\W*([a-zA-Z]{1,2}\d+\s+\d+[a-zA-Z]{1,2})?
Here XXX is some alphanumeric string, and DDD is a number. I expect group 1 to be Flat XXX if the address is of the first type or None if not, group 2 to be XXX <Building name>, <City/town>, <State> if the address if of the first type, or <Road name> road, <City/town>, <State> if it is of the second type, and group 3 to be the postcode if the address is of the first type or None if not. The postcode is a UK postcode for which my regex (not comprehensively accurate but mostly correct for my purpose) is [a-zA-Z]{1,2}\d+\s+\d+[a-zA-Z]{1,2}. Case is to be ignored and there may be no comma between Flat XXX (if it exists) and <Building name>, and there may be a comma between the city and the postcode (if it exists).
>>> address1 = 'Flat 29, Victoria House, Redwood Lane, Richmond, London SW14 9XY'
>>> re.search(r'(Flat \w+)?\W*(.+)\W*([a-zA-Z]{1,2}\d+\s+\d+[a-zA-Z]{1,2})?', address1, re.I).groups()
>>> ('Flat 29', 'Victoria House, Redwood Lane, Richmond, London SW14 9XY', None)
>>> address2 = '91 Fleet, Major Road, Fleet, Hampshire'
>>> re.search(r'(Flat \w+)?\W*(.+)\W*([a-zA-Z]{1,2}\d+\s+\d+[a-zA-Z]{1,2})?', address2, re.I).groups()
>>> (None, '91 Fleet, Major Road, Fleet, Hampshire', None)
I am not sure what is going wrong, but I think the middle group ..\W*(.+)\W*.. is more or less capturing everything.
..\W*(.+?)\W*..?Flat XXXif the address is of the first type orNoneif not, group 2 to beXXX <Building name>, <City/town>, <State>if the address if of the first type, or<Road name> road, <City/town>, <State>if it is of the second type, and group 3 to be the postcode if the address is of the first type orNoneif not.('Flat 29', 'V', None)for the first type of address, e.g. foraddress1 = Flat 29, Victoria House, Redwood Lane, Richmond, London SW14 9XY.