4

I am trying to find Strongly Connected Components in a large graph implementing Kosaraju's algorithm. It requires running a DFS on a graph in reverse, and then forward. If you're interested the list of edges for this graph are here: https://dl.dropboxusercontent.com/u/28024296/SCC.txt.tar.gz

I can't implement this recursively in Python, it exceeds its recursive limits and crashes if I increase them. I'm trying to implement through iteration.

Below is my code for 1. Loading the graph in reverse into a Dictionary, and 2. Running the DFS on it iteratively for each node from n -> 1.

This code runs perfect for small sample graphs but just doesn't run for this large graph. I get it's inefficient but any tips on how to make it work?

def reverseFileLoader():

    graph = collections.defaultdict(lambda: {'g': [], 's': False, 't': None, 'u': None })
    for line in open('/home/edd91/Documents/SCC.txt'):
        k, v = map(int, line.split())
        graph[v]['g'].append(k)

    return graph

def DFS(graph, i):
    global t
    global source
    stack = []
    seen = []
    seen.append(i)
    stack.append(i)

    while stack:
        s = stack[-1]
        j = len(graph[s]['g']) - 1
        h = 0
        while (j >= 0):
            if graph[s]['g'][j] not in seen and graph[graph[s]['g'][j]]['t'] == None:
                seen.append(graph[s]['g'][j])
                stack.append(graph[s]['g'][j])
                h += 1
            j -= 1

        if h == 0:
            if graph[s]['t'] == None:
                t += 1
                graph[s]['u'] = source
                graph[s]['t'] = t 
            stack.pop()

def DFSLoop(graph):
    global t
    t = 0
    global source
    source = None
    i = len(graph)
    while (i >= 1):
        print "running for " + str(i)
        source = i
        DFS(graph, i)
        i -= 1

1 Answer 1

2

Kosaraju's algorithm probably requires that checking whether an element has been seen is an O(1) operation. But your seen data structure has O(n) time membership testing. Converting seen from a list to a set makes the code execute in a few seconds on my system (after also removing the prints which took up most of the remaining execution time).

For completeness the changes you need to make are

  • Change seen = [] to seen = set()
  • Change each seen.append(...) to seen.add(...).
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks @trevor-merrifield ! Isn't it amazing such small changes make the code run, that's so helpful. I assume a set is hashed somehow?
Yep the source code setobject.h setobject.c shows it is like a hash table without values. Hence the O(1)-ish lookup time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.