I've list of people names in list,names are unique are but sometimes first name may appear after last name and vice versa.eg.
list[0]="Albert Einstein" and list[5]="Einstein Albert"
but finally I want one unique entry for each name
I tried edit dist but value returned may vary over wide range so not useful
please suggest good string matching module in python
-
1So what if there are two different people, one called "Arnold Dieter" and one called "Dieter Arnold". Would your program just figure these are the same guys? If yes, what purpose would that serve? If no, I don't see how this is actually possible. Also, what have you tried so far? Show us some code so we can figure out what's wrong with it.Niklas B.– Niklas B.2012-02-13 20:10:46 +00:00Commented Feb 13, 2012 at 20:10
-
I want to allocate number of machines per user if there are "duplicate entries" then it'll be problematicusername_4567– username_45672012-02-13 20:13:43 +00:00Commented Feb 13, 2012 at 20:13
-
So what about those two guys, "Arnold Dieter" and "Dieter Arnold". Do you want to remove one of those although they are two differen people? Wouldn't that be unfair??Niklas B.– Niklas B.2012-02-13 20:14:53 +00:00Commented Feb 13, 2012 at 20:14
-
No but in my situation I know Arnold Dieter and Dieter Arnold are not different.what you are saying is correct but i assure that these are same personsusername_4567– username_45672012-02-13 20:17:26 +00:00Commented Feb 13, 2012 at 20:17
Add a comment
|
2 Answers
Another way which also doesn't guarantee that the order of the name parts will be preserved if there is no duplicate:
>>> name_list = ["Albert Einstein", "Einstein Albert", "Abe Lincoln", "Lincoln Abe"]
>>> list(set(' '.join(sorted(n.split())) for n in name_list))
['Abe Lincoln', 'Albert Einstein']
Algorithm
- For each name
n, take it apart (n.split()), sort the parts (sorted(n.split())) and rejoin them (' '.join(sorted(n.split()))). Duplicates will now have the same representation - Make a
setout of the resulting generator to remove duplicates - Transform the temporary
setback into a list (although this might not be strictly necessary).
Comments
>>> x = ["Albert Einstein", "test 1 s 2", "Einstein Albert", "foo bar baz", "baz foo bar"]
>>> list(set(' '.join(sorted(s.split())) for s in x))
['bar baz foo', '1 2 s test', 'Albert Einstein']
4 Comments
Niklas B.
Sorry, I fail to see how this differs from my solution (expect that's it uses a list comprehension instead of a genexpr for no reason...).
ironchefpython
It doesn't differ significantly, and I agree that its better with generators. But if you're just trying to rub it in that F.J. came up with the same solution I did while I was testing mine in IDLE, I refuse to be intimidated. :-)
Niklas B.
Let me remove those unnecessary parens. Sorry for my harsh comment, didn't mean to intimidate :)
username_4567
upto some cases it'll work but still it may fail anyways thanks for giving me direction