Problem:
I need to get data from JSON file containing information about "contributors". Each contributor has an atrribute jobs, which is a list of string-like job positions. The program should print five most popular jobs (in the whole dataset) and assign an attribute top_job to each contributor with a job position of his which is the most frequent in the whole dataset. Use of as little extra libraries (excluding json) as possible is needed. I will greatly appreciate if anyone can suggest as to how the program might be improved in terms of efficiency! Thanks in advance!
Sample input:
[{'username': 'bartonmichelle',
...
'jobs': ['Teacher, special educational needs',
'Water engineer',
'Intelligence analyst',
'Automotive engineer',
'Geoscientist'],
'id': 173012},
{'username': 'ahardin',
...
'jobs': ['Water engineer',
'Private music teacher',
'Administrator',
'Television camera operator'],
'id': 113928}]
Sample output:
[{'username': 'bartonmichelle',
...
'jobs': ['Teacher, special educational needs',
'Water engineer',
'Intelligence analyst',
'Automotive engineer',
'Geoscientist'],
'id': 173012,
'top_job': 'Water engineer'}, # top job added based on job's frequency
{'username': 'ahardin',
...
'jobs': ['Water engineer',
'Private music teacher',
'Administrator',
'Television camera operator'],
'id': 113928,
'top_job': 'Water engineer'}] # top job added based on job's frequency
My approach:
from collections import Counter
jobs = []
with open('contributors_sample.json','r',encoding="utf-8") as f:
contributors_file = json.load(f)
for contributor in contributors_file:
jobs.extend(contributor['jobs'])
sorted_jobs = list(map(
lambda sorted_arg: sorted_arg[0],
sorted(
Counter(jobs).items(),
key=lambda tupleobj: tupleobj[1],
reverse=True
)
))
for contributor in contributors_file:
contributors_jobs = contributor['jobs']
top_job = contributors_jobs[0]
for job in contributors_jobs[1:]:
if sorted_jobs.index(job) < sorted_jobs.index(top_job):
top_job = job
contributor['top_job'] = top_job
contributors_file
Current execution time:
0.110848s