12245933,1418,1
12245933,1475,2
134514060,6112,3
134514064,10096,4
12245933,1536,5
...
134514097,16200,38
12245933,1475,39
I want to know for every row[0], the distance of re-occurance of the same value in row[1]
For example:
12245933 has the value 1475 in line 39 and line 2 ..
i want to know all the possible occurrences of 1475 for 12245933 in a file.
Code I tried.
#datafile parser
def parse_data(file):
pc_elements = defaultdict(list)
addr_elements = defaultdict(list)
with open(file, 'rb') as f:
line_number = 0
csvin = csv.reader((x.replace('\0','') for x in f), delimiter = ',')
for row in csvin:
try:
pc_elements[int(row[0])].append(line_number)
addr_elemets[int(row[1])].append(line_number)
line_number += 1
except:
print row
line_number += 1
pass
Maybe we can add row[1] as well in pc_elements dict? and get the indexes from that?
idand then you have data for this id. You can create a dict with id's as keys, and value would be a list of tuples representing your data, i.e. {12245933:[(1418,1), (1475,2)]} etc