2

I have a program in python which uses two files as inputs - and calculates the similarity between them. I want to use all possible combinations of files in a directory as input. How can this be done using python expanding upon the script that I already have?

I know there are tools such as glob which iterate through an entire file. However, what can I do to also create all of the different file combinations?

Also, as @hcwhsa and @Ashish Nitin Patil how can itertools be combined with glob??

Thank you for any insight.

Further detail:

My code requires 2 inputs that are identical (I have a directory of approx 50 of these files). Each input is 3-tab separated column (value1, value2, weight). Essentially with this information I calculate jaccard coefficient as found here:

def compute_jaccard_index(set_1, set_2):
    return len(set_1.intersection(set_2)) / float(len(set_1.union(set_2))) 

I want to calculate this coefficient for all the possible combinations of files in the directory. As of now, I called each file locally as:

with open('input_file1', 'r') as infile_B:
with open('input_file2', 'r') as infile_B:

My goal is to iterate the function over all possible combinations of files in the directory.

7
  • 1
    itertools.combinations Commented Nov 14, 2013 at 16:37
  • That is exactly what the code in my answer gives you - all filename combinations of all files in a given folder. Am I missing something? Commented Nov 14, 2013 at 17:01
  • No, that is exactly what I need - then using this should also use each file combination as various inputs? That is where I was not sure if I needed something like glob also. With your solution, all possible combinations of input1 and input2 will be created and used directly by the program? That is the main question - I am sorry if I did not express myself clearly. Commented Nov 14, 2013 at 17:07
  • Can you provide a list of sample input, and expected output? I still think that my answer answers the first part of your question - it outputs every possible combination of files in a folder. If I understand glob correctly, it is a tool that can filter out specific files for you, not a tool that iterates over a file. Commented Nov 14, 2013 at 17:14
  • 1
    If you only want files with a specific extensions, you can alter the filenames line to something like this: filenames = [os.path.join(path, entry) for entry in entries if os.path.isfile(os.path.join(path, entry)) and entry.split('.')[-1] == 'py'] Commented Nov 14, 2013 at 19:03

2 Answers 2

3

This snippet compares all files in path.

import os
from itertools import combinations

path = r'path/to/dir'
entries = os.listdir(path)
filenames = [os.path.join(path, entry) for entry in entries if os.path.isfile(os.path.join(path, entry))]

for (file1, file2) in combinations(filenames, 2):
    with open(file1) as f1, open(file2) as f2:
        # Compare the files

In Python 3, it may be done a bit more elegant.

import os
from itertools import combinations

path = r'path/to/dir'
root, _, rel_filenames = next(os.walk(path))
full_filenames = [os.path.join(root, f) for f in rel_filenames]

for (file1, file2) in combinations(full_filenames, 2):
    with open(file1) as f1, open(file2) as f2:
        # Compare the files
Sign up to request clarification or add additional context in comments.

2 Comments

this is great - i am still just not seeing very clearly how to link this with glob
I haven't used glob, so I wouldn't know that. What is it specifically you want to do? It might help if you provide us with some of the code you already have.
2
import itertools
import os
for file_1, file_2 in itertools.combinations(os.listdir(os.getcwd()), 2):
    print(file_1, file_2)
    # compare the files

Replace os.getcwd() with your directory path.

2 Comments

using this will, for example: create all possible combinations of input1 and input2?
This will also output combinations with subdirs.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.