0

I've list of people names in list,names are unique are but sometimes first name may appear after last name and vice versa.eg. list[0]="Albert Einstein" and list[5]="Einstein Albert"
but finally I want one unique entry for each name I tried edit dist but value returned may vary over wide range so not useful please suggest good string matching module in python

4
  • 1
    So what if there are two different people, one called "Arnold Dieter" and one called "Dieter Arnold". Would your program just figure these are the same guys? If yes, what purpose would that serve? If no, I don't see how this is actually possible. Also, what have you tried so far? Show us some code so we can figure out what's wrong with it. Commented Feb 13, 2012 at 20:10
  • I want to allocate number of machines per user if there are "duplicate entries" then it'll be problematic Commented Feb 13, 2012 at 20:13
  • So what about those two guys, "Arnold Dieter" and "Dieter Arnold". Do you want to remove one of those although they are two differen people? Wouldn't that be unfair?? Commented Feb 13, 2012 at 20:14
  • No but in my situation I know Arnold Dieter and Dieter Arnold are not different.what you are saying is correct but i assure that these are same persons Commented Feb 13, 2012 at 20:17

2 Answers 2

2

Another way which also doesn't guarantee that the order of the name parts will be preserved if there is no duplicate:

>>> name_list = ["Albert Einstein", "Einstein Albert", "Abe Lincoln", "Lincoln Abe"]
>>> list(set(' '.join(sorted(n.split())) for n in name_list))
['Abe Lincoln', 'Albert Einstein']

Algorithm

  1. For each name n, take it apart (n.split()), sort the parts (sorted(n.split())) and rejoin them (' '.join(sorted(n.split()))). Duplicates will now have the same representation
  2. Make a set out of the resulting generator to remove duplicates
  3. Transform the temporary set back into a list (although this might not be strictly necessary).
Sign up to request clarification or add additional context in comments.

Comments

1
>>> x = ["Albert Einstein", "test 1 s 2", "Einstein Albert", "foo bar baz", "baz foo bar"]
>>> list(set(' '.join(sorted(s.split())) for s in x))
['bar baz foo', '1 2 s test', 'Albert Einstein']

4 Comments

Sorry, I fail to see how this differs from my solution (expect that's it uses a list comprehension instead of a genexpr for no reason...).
It doesn't differ significantly, and I agree that its better with generators. But if you're just trying to rub it in that F.J. came up with the same solution I did while I was testing mine in IDLE, I refuse to be intimidated. :-)
Let me remove those unnecessary parens. Sorry for my harsh comment, didn't mean to intimidate :)
upto some cases it'll work but still it may fail anyways thanks for giving me direction

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.