Suppose I have 100 files, and loop through all of them. In each file, there are records of several attributes: (the total number of attributes are not known before reading all the files)
Assume a simple case that after reading all the files, we obtain 20 different attributes and the following information:
File_001: a1, a3, a5, a2
File_002: a1, a3
File_003: a4
File_004: a4, a2, a6
File_005: a7, a8, a9
...
File_100: a19, a20
[Update] Or in another representation, where each line is a single match between one File and one attribute:
File_001: a1
File_001: a3
File_001: a5
File_001: a2
File_002: a1
File_002: a3
File_003: a4
File_004: a4
File_004: a2
File_004: a6
...
File_100: a19
File_100: a20
How can I generate the "reverse" statistics table, i.e.:
a1: File_001, File_002, File_006, File_083
a2: File_001, File_004
...
a20: File_099, File_100
How can I do it in Python (2.7.x)? (and with or without Pandas. I think Pandas might help)