I have a csv file in which the first column contains an identifier and the second column associated data. The identifier is replicated an arbitrary number of times so the file looks like this.
data1,123
data1,345
data1,432
data2,654
data2,431
data3,947
data3,673
I would like to merge the records to generate a single record for each identifier and get.
data1,123,345,432
data2,654,431
data3,947,673
Is there an efficient way to do this in python or numpy? Dictionaries appear to be out due to duplicate keys. At the moment I have the lines in a list of lists then looping through and testing for identity with the previous value at index 0 in the list but this is very clumsy. Thanks for any help.
{'data1': [123, 345, 432], 'data2': [654, 431], 'data3': [947, 673]}