87

I want to sort a list of named tuples without having to remember the index of the fieldname. My solution seems rather awkward and was hoping someone would have a more elegant solution.

from operator import itemgetter
from collections import namedtuple

Person = namedtuple('Person', 'name age score')
seq = [
    Person(name='nick', age=23, score=100),
    Person(name='bob', age=25, score=200),
]

# sort list by name
print(sorted(seq, key=itemgetter(Person._fields.index('name'))))
# sort list by age
print(sorted(seq, key=itemgetter(Person._fields.index('age'))))

Thanks, Nick

2
  • Is the field name always given as a string or does the solution by @clyfish also work? Commented Aug 23, 2012 at 8:51
  • I wasn't trying to do anything dynamic, so both solutions work perfectly. Commented Aug 23, 2012 at 9:01

5 Answers 5

111
from operator import attrgetter
from collections import namedtuple

Person = namedtuple('Person', 'name age score')
seq = [Person(name='nick', age=23, score=100),
       Person(name='bob', age=25, score=200)]

Sort list by name

sorted(seq, key=attrgetter('name'))

Sort list by age

sorted(seq, key=attrgetter('age'))
Sign up to request clarification or add additional context in comments.

Comments

71
sorted(seq, key=lambda x: x.name)
sorted(seq, key=lambda x: x.age)

4 Comments

I think this is more elegant than using attrgetter
I prefer the attrgetter, but that is just taste. An advantage is also if I were to get the fields to sort on dynamically. Then I could just pass the string.
@zenpoy Keep in mind attrgetter performs much better and lambdas arent usually considered elegant
and sorted(seq, key=lambda x: [x.age, x.name]) sorts by multiple attributes
19

I tested the two alternatives given here for speed, since @zenpoy was concerned about performance.

Testing script:

import random
from collections import namedtuple
from timeit import timeit
from operator import attrgetter

runs = 10000
size = 10000
random.seed = 42
Person = namedtuple('Person', 'name,age')
seq = [Person(str(random.randint(0, 10 ** 10)), random.randint(0, 100)) for _ in range(size)]

def attrgetter_test_name():
    return sorted(seq.copy(), key=attrgetter('name'))

def attrgetter_test_age():
    return sorted(seq.copy(), key=attrgetter('age'))

def lambda_test_name():
    return sorted(seq.copy(), key=lambda x: x.name)

def lambda_test_age():
    return sorted(seq.copy(), key=lambda x: x.age)

print('attrgetter_test_name', timeit(stmt=attrgetter_test_name, number=runs))
print('attrgetter_test_age', timeit(stmt=attrgetter_test_age, number=runs))
print('lambda_test_name', timeit(stmt=lambda_test_name, number=runs))
print('lambda_test_age', timeit(stmt=lambda_test_age, number=runs))

Results:

attrgetter_test_name 44.26793992166096
attrgetter_test_age 31.98247099677627
lambda_test_name 47.97959511074551
lambda_test_age 35.69356267603864

Using lambda was indeed slower. Up to 10% slower.

EDIT:

Further testing shows the results when sorting using multiple attributes. Added the following two test cases with the same setup:

def attrgetter_test_both():
    return sorted(seq.copy(), key=attrgetter('age', 'name'))

def lambda_test_both():
    return sorted(seq.copy(), key=lambda x: (x.age, x.name))

print('attrgetter_test_both', timeit(stmt=attrgetter_test_both, number=runs))
print('lambda_test_both', timeit(stmt=lambda_test_both, number=runs))

Results:

attrgetter_test_both 92.80101586919373
lambda_test_both 96.85089983147456

Lambda still underperforms, but less so. Now about 5% slower.

Testing is done on Python 3.6.0.

Comments

4

since nobody mentioned using itemgetter(), here how you do using itemgetter().

from operator import itemgetter
from collections import namedtuple

Person = namedtuple('Person', 'name age score')
seq = [
    Person(name='nick', age=23, score=100),
    Person(name='bob', age=25, score=200),
]

# sort list by name
print(sorted(seq, key=itemgetter(0)))

# sort list by age
print(sorted(seq, key=itemgetter(1)))

Comments

4

This might be a bit too 'magical' for some, but I'm partial to:

# sort list by name
print(sorted(seq, key=Person.name.fget))

Edit: this assumes namedtuple uses the property() built-in to implement the accessors, because it leverages the fget attribute on such a property (see documentation). This may still be true in some implementations but it seems CPython no longer does that, which I think is related to optimization work referenced in https://bugs.python.org/issue32492 (so, since 3.8). Such fragility is the cost of the "magic" I mentioned; namedtuple certainly doesn't promise to use property().

Writing Person.name.__get__ is better (works before & after the implementation change) but is maybe not worth the arcaneness vs. just writing it more plainly as lambda p: p.name

2 Comments

Could you please add some context to your answer? I tried it but am getting AttributeError: '_collections._tuplegetter' object has no attribute 'fget'.
Sure. This approach used to work but it depended on a particular implementation detail of namedtuple. That seems to have been changed in bugs.python.org/issue32492. Will add an edit to the answer shortly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.