Computational complexity of DFS over the worst case DAG

Question

I got confused by calculating the worst-case DAG complexity when you do a Naive DFS from one node to another node.

For example in the following DAG, Node A act as a start node, and if we always pick the last node, in this case, Node D as the end node, use DFS to calculate all path from A to D.

In this case, we have DFS going through Paths:

path 1st:  A -> B -> C -> D
path 2nd:  A -> B -> D
path 3rd:  A -> C -> D
path 4th:  A -> D

The computational complexity was 8 because it takes 3 iterations in the first path, 2 in the second, 2 in the third, and 1 in the last one.

Now if we expand this graph, add more nodes after.

Assuming we have N nodes, what is O(N) then?

The way I do the calculation for the number of total paths is like, every time we add a new node to an N-node-DAG, to replace Node A as the new beginning node, we add N new edge here. Because we need to add edges to go from the new start node to all nodes that existed.

If we assume P as total paths, we have
P(N) = P(N-1) + P(N-2) + P(N-3) + .... + P(1)

then you have 
P(N) = 2 * P(N-1)

then
P(N) = O(2^N)

However, if we consider that not all paths is using the computational complexity of O(1), for example, the longest path that goes through all the nodes, we take O(N) for that single path, the actual cost is higher than O(2^N).

So what could that be then?

After the first path, shouldn't it go through the path A --> B --> D as well? Or do you store the visited vertices to avoid visiting same vertex more than once? If this is the case, there wouldn't be second and third paths though. Could you clarify these points? — Muhteva
– Muhteva, Commented Jun 12, 2022 at 8:26
Why do we have two edges from A to B and B to C, but not from C to D? What is the pattern here? — Muhteva
– Muhteva, Commented Jun 14, 2022 at 10:25

Muhteva · Accepted Answer · 2022-06-15 09:19:24Z

1

Current Algorithm and Time Complexity

As far as I understand, your current algorithm follows the steps below:

Start from a start node specified by us.
In each node, store the adjacent vertices in a stack. Then pop the first element from stack and repeat step 2 until the stack gets empty.
Terminate execution until the stack gets empty.

Our graph is a DAG, therefore there won't be any cycle in the graph and algorithm is guaranteed to terminate eventually.

For further examination, you mentioned about expanding the graph in the same manner. This means that, whenever we add the i(th) node to the graph, we have to create vertices from each node to that node - which means we have to insert i edges to the graph.

Let's say you start from the first node. In this case, time complexity will be:

T(1) = [1+T(2)] + [1+T(3)] + ... + [1+T(n-3)] + [1+T(n-2)] + [1+T(n-1)] + [1+T(n)]
T(1) = [1+T(2)] + [1+T(3)] + ... + [1+(T(n-2)+T(n-1)+T(n))] + [1+(T(n-1)+T(n))] + [1+T(n)] + 0 (assuming T(n) = 0)
T(1) = [1+T(2)] + [1+T(3)] + ... + (2+1+1) + (1+1) + 1 + 0
T(1) = [1+T(2)] + [1+T(3)] + ... + 4 + 2 + 1 
with this manner
(observe the pattern, for T(n-1), we get 2^0 - so for T(2), we'll get 2^(n-3))
T(1) = 2^(n-3) + 2^(n) + ... + 2^(0)
T(1) = 2^(n-2) - 1

Eventually it turns out that the time complexity if O(2^N) for this algorithm. It turns out to be pretty bad, it's because this is extraordinarily brute force. Let's come up with an optimized algorithm that stores the information of visited vertices.

Note:

I spotted that there are two edges from A to B and B to C, but not from C to D. I couldn't understand the pattern here, but if it's the case then it means it requires more operations than 2^N. Well, naive DFS is a bad algorithm anyways, I strongly recommend you to implement the one below.

Optimized Algorithm and Time Complexity

To make your algorithm optimized, you can follow the steps below: 0) Create a boolean array to mark each state as visited when you visit them. Assigned each index to false initially.

Start from a start node specified by us.
In each node, store the adjacent vertices (that are not marked as visited) in a stack. Mark them as visited. Then pop the first element from stack and repeat step 2 until the stack gets empty.
Terminate execution until the stack gets empty.

This time, by storing an array to mark visited nodes, we avoid the phenomenon of visiting same node more than once. This algorithm, traditionally, has the time complexity of O(N + E), N being the number of nodes and E being the number of edges. In your case, your graph seems like a complete graph (if it was undirected though), therefore E ~ N^2, meaning that your time complexity will be O(N^2).

edited Jun 15, 2022 at 9:19

answered Jun 14, 2022 at 10:47

Muhteva

2,8402 gold badges12 silver badges23 bronze badges

Sign up to request clarification or add additional context in comments.

6 Comments

Robinson Wei Over a year ago

Thanks for spotting the graph error, corrected

Robinson Wei Over a year ago

So, extending from your function above, T(1) = T(2) + T(3) + ... + T(n-3) + T(n-2) + T(n-1) + T(n), when you calculate the computational cost, you also have the cost from the first node goes to each node, so it is actually

T(1) = 1 + T(2) + 1 + T(3) + .1+.. + 1 + T(n-3) + 1 + T(n-2) + 1 + T(n-1) + 1+ T(n) = n-1 + T(2) + T(3) + ... + T(n-3) + T(n-2) + T(n-1) + T(n)

, and this will make this even worse than 2^n

Muhteva Over a year ago

There was a mistake in the notation of my calculation. In the previous version I made T(n) + 0 equals 1, which was incorrect given the fact that one line above I stated that T(n) = 0. That should have been [T(n) + 1] + 0. I corrected the notation. However my result was correct (I implicitly considered the cost from each node to other node). So time complexity of this operation should be in the same class of time complexity 2^n eventually. I hope I'm not missing something at this point.

Robinson Wei Over a year ago

I think it is a bit different, posted my calculation.

Muhteva Over a year ago

Your calculation is very similar to mine and result turns out to be 2^N. They are the same solutions, I don't see the differences.

|

Robinson Wei · Accepted Answer · 2022-06-16 12:51:30Z

0

I think I found out.

so if we consider that each new node will add a new edge from the new node to each node in DAG, traversing each of those new edges will take complexity as 1.

Then we could have (if we use C as computational complexity):

C(N) = 1 + C(N-1) + 1 + C(N-2) + 1+ C(N-3) + .... + 1 + C(1)

then you have 

C(N) = 2 * C(N-1) + N - 1
     = 2^2 * C(N-2) + 2 * (N-2) + N - 1
     = 2^3 * C(N-3) + 2^2 * (N-3) + 2 * (N -2) + N - 1
     = 2^(N-1) * C(1) + 2^(N-2) * 1 + ...... + N - 1

At this moment this becomes a sum of the geometric progression of N, the ratio is 2, so the highest rank item is at 2^N.

Thus O(2^N) is the answer.

edited Jun 16, 2022 at 12:51

answered Jun 15, 2022 at 10:48

Robinson Wei

1634 silver badges13 bronze badges

Collectives™ on Stack Overflow

Computational complexity of DFS over the worst case DAG

2 Answers 2

Current Algorithm and Time Complexity

Note:

Optimized Algorithm and Time Complexity

6 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Current Algorithm and Time Complexity

Note:

Optimized Algorithm and Time Complexity

6 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related