I have the following list jargs.
jargs = ['10192393\t15\t26\tskin tumour\tDiseaseClass\tD012878',
'10192393\t443\t449\tcancer\tDiseaseClass\tD009369',
'10192393\t483\t496\tcolon cancers\tDiseaseClass\tD003110',
'10194428\t30\t45\themochromatosis\tModifier\tD016399',
'10194428\t102\t117\themochromatosis\tSpecificDisease\tD006432',
'10194428\t119\t145\tHereditary hemochromatosis\tSpecificDisease\tD006432',
'10194428\t147\t149\tHH\tDiseaseClass\tD006432']
I want to write a program that outputs the following:
ents =
[
'10192393', {"entities":[(15, 26,"DiseaseClass"), (443, 449, "DiseaseClass"), (483, 496, "DiseaseClass")]},
'10194428', {"entities": [(30, 45, "Modifier"), (102, 117, "SpecificDisease"), (119, 145, "SpecificDisease"), (147, 149, "DiseaseClass")]}
]
I tried the following:
ents = [list(set([jargs[i].split('\t')[0] for i in range(len(jargs))]))[0],\
{"entities": [(int(jargs[i].split('\t')[1]), int(jargs[i].split('\t')[2]),\
jargs[i].split('\t')[-2]) for i in range(len(jargs))]}]
Unfortunately, this code outputs the following
['10194428',
{'entities': [('15', '26', 'DiseaseClass'),
('443', '449', 'DiseaseClass'),
('483', '496', 'DiseaseClass'),
('30', '45', 'Modifier'),
('102', '117', 'SpecificDisease'),
('119', '145', 'SpecificDisease'),
('147', '149', 'DiseaseClass')]}]
Which is not the output expected.