Skip to main content
added 194 characters in body
Source Link
Reinderien
  • 71.2k
  • 5
  • 76
  • 257

Much more complicated than it needs to be. You extend a list, then traverse the list to construct a counter, then traverse again to get a sorted sequence, then traverse again to get a list of keys only; this all needs to go away - especially the lambda/map style which is better expressed with comprehensions.

Instead, work with your Counter as a first-class citizen. Don't juggle indices. And spend some quality time reading the Counter documentation.

Suggested

import json
from collections import Counter
from pprint import pprint

with open('contributors_sample.json') as f:
    contributors_file = json.load(f)

jobs = Counter()
for contributor in contributors_file:
    jobs.update(contributor['jobs'])

print('Top jobs:', jobs.most_common(5))

for contributor in contributors_file:
    top_freq, contributor['top_job'] = max(
        (jobs[job], job)
        for job in contributor['jobs']
    )

pprint(contributors_file)

As a more direct and obscure alternative, the top_job assignment can be written as

    contributor['top_job'] = max(
        contributor['jobs'], key=jobs.__getitem__,
    )

Much more complicated than it needs to be. You extend a list, then traverse the list to construct a counter, then traverse again to get a sorted sequence, then traverse again to get a list of keys only; this all needs to go away - especially the lambda/map style which is better expressed with comprehensions.

Instead, work with your Counter as a first-class citizen. Don't juggle indices. And spend some quality time reading the Counter documentation.

Suggested

import json
from collections import Counter
from pprint import pprint

with open('contributors_sample.json') as f:
    contributors_file = json.load(f)

jobs = Counter()
for contributor in contributors_file:
    jobs.update(contributor['jobs'])

print('Top jobs:', jobs.most_common(5))

for contributor in contributors_file:
    top_freq, contributor['top_job'] = max(
        (jobs[job], job)
        for job in contributor['jobs']
    )

pprint(contributors_file)

Much more complicated than it needs to be. You extend a list, then traverse the list to construct a counter, then traverse again to get a sorted sequence, then traverse again to get a list of keys only; this all needs to go away - especially the lambda/map style which is better expressed with comprehensions.

Instead, work with your Counter as a first-class citizen. Don't juggle indices. And spend some quality time reading the Counter documentation.

Suggested

import json
from collections import Counter
from pprint import pprint

with open('contributors_sample.json') as f:
    contributors_file = json.load(f)

jobs = Counter()
for contributor in contributors_file:
    jobs.update(contributor['jobs'])

print('Top jobs:', jobs.most_common(5))

for contributor in contributors_file:
    top_freq, contributor['top_job'] = max(
        (jobs[job], job)
        for job in contributor['jobs']
    )

pprint(contributors_file)

As a more direct and obscure alternative, the top_job assignment can be written as

    contributor['top_job'] = max(
        contributor['jobs'], key=jobs.__getitem__,
    )
Source Link
Reinderien
  • 71.2k
  • 5
  • 76
  • 257

Much more complicated than it needs to be. You extend a list, then traverse the list to construct a counter, then traverse again to get a sorted sequence, then traverse again to get a list of keys only; this all needs to go away - especially the lambda/map style which is better expressed with comprehensions.

Instead, work with your Counter as a first-class citizen. Don't juggle indices. And spend some quality time reading the Counter documentation.

Suggested

import json
from collections import Counter
from pprint import pprint

with open('contributors_sample.json') as f:
    contributors_file = json.load(f)

jobs = Counter()
for contributor in contributors_file:
    jobs.update(contributor['jobs'])

print('Top jobs:', jobs.most_common(5))

for contributor in contributors_file:
    top_freq, contributor['top_job'] = max(
        (jobs[job], job)
        for job in contributor['jobs']
    )

pprint(contributors_file)