I am trying to write a regular expression to standardize names.
Use case:
J. J. Abrams -> JJ Abrams
J J Abrams -> JJ Abrams
J.J Abrams -> JJ Abrams
J.J. Abrams -> JJ Abrams
J J Abrams -> JJ Abrams (multiple spaces)
The initials can appear at the end or in the middle of the name. In general an initial can have spaces or a '.' or a word boundary before or after it.
So I came up with this following:
p = re.compile(r'((\b|\s+|\.)[a-z](\.|\s+|\b))', re.I)
When I try to match and print the result, it looks wrong:
p.subn(lambda g: g.groups()[0].strip().strip('.'), "J J Abrams")
('JJAbrams', 2)
How do I retain the space before(or after) the non initial part?
Edit Also, I should have made it clear, there can be more than just 2 initials in the name. The above was just one random use case. Thanks