2

I have generated two multi-component lists with the following script:

list1 = list()
for line in infile1.readlines():
    list1.append(line.split('\t'))

list2 = list()
for line in infile2.readlines():
    list2.append(line.split(‘\t’))

The lists look like this:

list1 = ('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')...

list2 = ('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')...

The first element from the first entry in list1 (in this case "1960") will match the first element of one or more entries in list2. What I would like to do is locate each match and then add the last element of the list2 entry to the list1 entry. An example of the desired output would be:

('1960', 'chr17', '+', 'RNF213', 'miR-K12-12')

I have tried this, but it returns nothing:

result = []
for list1[0] in list1:
    if list1[0] == list2[0]:
        result.append((list1[0:], list2[1]))
2
  • What happens if there are more than one matching entries in list2? Commented Sep 4, 2014 at 15:50
  • I'm assuming if there are multiple matches they should all be appended. Commented Sep 4, 2014 at 15:54

2 Answers 2

4

Put the values from list 2 into a dictionary; each unique value in the first column pointing to a list of values from the second column. Because you have tab-separated values, you should really use the csv module here:

import csv

lines2 = {}

with open(filename2, 'rb') as infile2:
    reader = csv.reader(infile2, delimiter='\t')
    for row in reader:
        lines2.setdefault(row[0], []).append(row[1])

dict.setdefault() sets a default value (a list object here) if the key is not yet present in the dictionary. This allows us to append to an empty list for the first value, then subsequently to the already-existing list for the rest.

Now you can trivially look up matching lines when processing the other file:

with open(filename1, 'rb') as infile1:
    reader = csv.reader(infile1, delimiter='\t')
    for row in reader:
        row += lines2.get(row[0], [])
        print row

Demo:

>>> import csv
>>> list1 = ['\t'.join(r) for r in [('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')]]
>>> list2 = ['\t'.join(r) for r in [('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')]]
>>> lines2 = {}
>>> reader = csv.reader(list2, delimiter='\t')
>>> for row in reader:
...     lines2.setdefault(row[0], []).append(row[1])
... 
>>> lines2
{'1482': ['miR-K12-1'], '1960': ['miR-K12-12'], '1018': ['miR-K12-4-5p']}
>>> reader = csv.reader(list1, delimiter='\t')
>>> for row in reader:
...     row += lines2.get(row[0], [])
...     print row
... 
['1960', 'chr17', '+', 'RNF213', 'miR-K12-12']
['1963', 'chr16', '+', 'SF3B3']
['1964', 'chr4', '-', 'GPRIN3']
Sign up to request clarification or add additional context in comments.

Comments

2

EDIT: Don't use this method. I'm leaving it up though because someone else might be able to learn from @Martijn's comments.

list1 = [('1960', 'chr17', '+', 'RNF213'), ('1963', 'chr16', '+', 'SF3B3'), ('1964', 'chr4', '-', 'GPRIN3')]
list2 = [('1482', 'miR-K12-1'), ('1018', 'miR-K12-4-5p'), ('1960', 'miR-K12-12')]

results = []
for x in list1:
    for y in list2:
        if x[0] == y[0]:
            results.append( x + (y[-1], ))
print results
>>>
[('1960', 'chr17', '+', 'RNF213', 'miR-K12-12')]

5 Comments

This does way too much work. This takes M * N loops (where M and N are the sizes of the two lists). Using a dictionary gives you a M + N solution instead; e.g. loop once over each list, sequentially.
If list1 is 10,000 elements long, and list 2 contains 5,000 elements, your version requires 50 million iterations. Mine just 15,000.
Thanks for elucidating. I've got some old code to go update!
Hmm want to give +1 for the "don't use this method"
Thank you! I tried your method and it worked perfectly.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.