2
12245933,1418,1
12245933,1475,2
134514060,6112,3
134514064,10096,4
12245933,1536,5
...
134514097,16200,38
12245933,1475,39

I want to know for every row[0], the distance of re-occurance of the same value in row[1]

For example:

12245933  has the value 1475 in line 39 and line 2 ..
i want to know all the possible occurrences of 1475 for 12245933 in a file.

Code I tried.

#datafile parser
def parse_data(file):
    pc_elements = defaultdict(list)
    addr_elements = defaultdict(list)
    with open(file, 'rb') as f:
        line_number = 0
        csvin = csv.reader((x.replace('\0','') for x in f), delimiter = ',')
        for row in csvin:
            try:
                pc_elements[int(row[0])].append(line_number)
                addr_elemets[int(row[1])].append(line_number)
                line_number += 1
            except:
                print row
                line_number += 1
                pass

Maybe we can add row[1] as well in pc_elements dict? and get the indexes from that?

6
  • dictionary can't have duplicate keys. Commented Feb 28, 2014 at 5:10
  • is there any other way than using lists? Lists take a long time to process as the size increases. Commented Feb 28, 2014 at 5:11
  • I assume first column is kind of id and then you have data for this id. You can create a dict with id's as keys, and value would be a list of tuples representing your data, i.e. {12245933:[(1418,1), (1475,2)]} etc Commented Feb 28, 2014 at 5:15
  • Hey you can concatenate two keys. i mean number:other and put it in dictionary. Commented Feb 28, 2014 at 5:15
  • exmaple dict[12245933:1475] += 1 Commented Feb 28, 2014 at 5:16

2 Answers 2

5

Use tuples as your dictionary keys:

In [63]: d='''
    ...: 12245933,1418,1
    ...: 12245933,1475,2
    ...: 134514060,6112,3
    ...: 134514064,10096,4
    ...: 12245933,1536,5
    ...: 134514097,16200,38
    ...: 12245933,1475,39
    ...: '''

In [64]: from collections import defaultdict
    ...: dic=defaultdict(list)
    ...: for l in d.split():
    ...:     tup=tuple(int(i) for i in l.split(','))
    ...:     dic[tup[:2]].append(tup[2])

In [65]: dic[(12245933, 1475)]
Out[65]: [2, 39]
Sign up to request clarification or add additional context in comments.

2 Comments

this would fail to give the tuples that are existing onlyonce?
@pistal what? not sure what do you mean
1

Use nested dictionaries. Map 1224953 to a dictionary which maps 1475 to a list of line numbers where the values occur.

So your final dictionary would look like {1224953 => {1475=>[39, 2]}}

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.