1

I have the following list jargs.

jargs = ['10192393\t15\t26\tskin tumour\tDiseaseClass\tD012878', 
         '10192393\t443\t449\tcancer\tDiseaseClass\tD009369',
         '10192393\t483\t496\tcolon cancers\tDiseaseClass\tD003110',
         '10194428\t30\t45\themochromatosis\tModifier\tD016399',
         '10194428\t102\t117\themochromatosis\tSpecificDisease\tD006432',
         '10194428\t119\t145\tHereditary hemochromatosis\tSpecificDisease\tD006432',
         '10194428\t147\t149\tHH\tDiseaseClass\tD006432']

I want to write a program that outputs the following:

ents = 
[
'10192393', {"entities":[(15, 26,"DiseaseClass"), (443, 449, "DiseaseClass"), (483, 496, "DiseaseClass")]}, 
'10194428', {"entities": [(30, 45, "Modifier"), (102, 117, "SpecificDisease"), (119, 145, "SpecificDisease"), (147, 149, "DiseaseClass")]}
]

I tried the following:

ents = [list(set([jargs[i].split('\t')[0] for i in range(len(jargs))]))[0],\
       {"entities": [(int(jargs[i].split('\t')[1]), int(jargs[i].split('\t')[2]),\
       jargs[i].split('\t')[-2]) for i in range(len(jargs))]}]

Unfortunately, this code outputs the following

['10194428',
 {'entities': [('15', '26', 'DiseaseClass'),
   ('443', '449', 'DiseaseClass'),
   ('483', '496', 'DiseaseClass'),
   ('30', '45', 'Modifier'),
   ('102', '117', 'SpecificDisease'),
   ('119', '145', 'SpecificDisease'),
   ('147', '149', 'DiseaseClass')]}]

Which is not the output expected.

1 Answer 1

2
from pprint import pprint

tmp = {}
for item in jargs:
    id_, v1, v2, _, v3, *_ = item.split("\t")
    tmp.setdefault(id_, []).append((v1, v2, v3))

ents = []
for k, v in tmp.items():
    ents.append(k)
    ents.append({"entities": v})

pprint(ents)

Prints:

['10192393',
 {'entities': [('15', '26', 'DiseaseClass'),
               ('443', '449', 'DiseaseClass'),
               ('483', '496', 'DiseaseClass')]},
 '10194428',
 {'entities': [('30', '45', 'Modifier'),
               ('102', '117', 'SpecificDisease'),
               ('119', '145', 'SpecificDisease'),
               ('147', '149', 'DiseaseClass')]}]
Sign up to request clarification or add additional context in comments.

3 Comments

I was going to post exactly this. A few clear loops are better than forcing a hot mess in a comprehension list and you also don't get to repeat yourself with jargs.split('\t').
Perhaps consider from collections import defaultdict?
@JustinEzequiel That's an alternative too - depends on personal preference. For simple tasks like these I just use dict.setdefault.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.