I may be misunderstanding the problem, but I'm envisioning a solution where you iterate over the list of names and dynamically construct a new regexp for each name, and then store all of these regexps in a dictionary to use later:
import re
names = [ 'John Kelly Smith', 'Billy Bob Jones', 'Joe James', 'Kim Smith' ]
regexps={}
for name in names:
elements=name.split()
if len(elements) == 3:
pattern = '(%s(\.|%s)?)?(\ )?(%s(\.|%s)? )?%s$' % (elements[0][0], \
elements[0][1:], \
elements[1][0], \
elements[1][1:], \
elements[2])
elif len(elements) == 2:
pattern = '%s(\.|%s)? %s$' % (elements[0][0], \
elements[0][1:], \
elements[1])
else:
continue
regexps[name]=re.compile(pattern)
jksmith_regexp = regexps['John Kelly Smith']
print bool(jksmith_regexp.match('K. Smith'))
print bool(jksmith_regexp.match('John Smith'))
print bool(jksmith_regexp.match('John K. Smith'))
print bool(jksmith_regexp.match('J. Smith'))
This way you can easily keep track of which regexp will find which name in your text.
And you can also do handy things like this:
if( sum([bool(reg.match('K. Smith')) for reg in regexps.values()]) > 1 ):
print "This string matches multiple names!"
Where you check to see if some of the names in your text are ambiguous.
R.*Goyalis a regular expression that will match all of those names. However, it seems unlikely that you really want to use regular expressions to solve the general problem of grouping names that are likely the same person.R.*Goyalwill also match other names such asRaj Anything Goyal