0

I got confused by calculating the worst-case DAG complexity when you do a Naive DFS from one node to another node.

For example in the following DAG, Node A act as a start node, and if we always pick the last node, in this case, Node D as the end node, use DFS to calculate all path from A to D.

enter image description here

In this case, we have DFS going through Paths:

path 1st:  A -> B -> C -> D
path 2nd:  A -> B -> D
path 3rd:  A -> C -> D
path 4th:  A -> D

The computational complexity was 8 because it takes 3 iterations in the first path, 2 in the second, 2 in the third, and 1 in the last one.

Now if we expand this graph, add more nodes after.

Assuming we have N nodes, what is O(N) then?

The way I do the calculation for the number of total paths is like, every time we add a new node to an N-node-DAG, to replace Node A as the new beginning node, we add N new edge here. Because we need to add edges to go from the new start node to all nodes that existed.

If we assume P as total paths, we have
P(N) = P(N-1) + P(N-2) + P(N-3) + .... + P(1)

then you have 
P(N) = 2 * P(N-1)

then
P(N) = O(2^N)

However, if we consider that not all paths is using the computational complexity of O(1), for example, the longest path that goes through all the nodes, we take O(N) for that single path, the actual cost is higher than O(2^N).

So what could that be then?

3
  • After the first path, shouldn't it go through the path A --> B --> D as well? Or do you store the visited vertices to avoid visiting same vertex more than once? If this is the case, there wouldn't be second and third paths though. Could you clarify these points? Commented Jun 12, 2022 at 8:26
  • yes, good catch, modified Commented Jun 14, 2022 at 5:42
  • Why do we have two edges from A to B and B to C, but not from C to D? What is the pattern here? Commented Jun 14, 2022 at 10:25

2 Answers 2

1

Current Algorithm and Time Complexity

As far as I understand, your current algorithm follows the steps below:

  1. Start from a start node specified by us.
  2. In each node, store the adjacent vertices in a stack. Then pop the first element from stack and repeat step 2 until the stack gets empty.
  3. Terminate execution until the stack gets empty.

Our graph is a DAG, therefore there won't be any cycle in the graph and algorithm is guaranteed to terminate eventually.

For further examination, you mentioned about expanding the graph in the same manner. This means that, whenever we add the i(th) node to the graph, we have to create vertices from each node to that node - which means we have to insert i edges to the graph.

Let's say you start from the first node. In this case, time complexity will be:

T(1) = [1+T(2)] + [1+T(3)] + ... + [1+T(n-3)] + [1+T(n-2)] + [1+T(n-1)] + [1+T(n)]
T(1) = [1+T(2)] + [1+T(3)] + ... + [1+(T(n-2)+T(n-1)+T(n))] + [1+(T(n-1)+T(n))] + [1+T(n)] + 0 (assuming T(n) = 0)
T(1) = [1+T(2)] + [1+T(3)] + ... + (2+1+1) + (1+1) + 1 + 0
T(1) = [1+T(2)] + [1+T(3)] + ... + 4 + 2 + 1 
with this manner
(observe the pattern, for T(n-1), we get 2^0 - so for T(2), we'll get 2^(n-3))
T(1) = 2^(n-3) + 2^(n) + ... + 2^(0)
T(1) = 2^(n-2) - 1

Eventually it turns out that the time complexity if O(2^N) for this algorithm. It turns out to be pretty bad, it's because this is extraordinarily brute force. Let's come up with an optimized algorithm that stores the information of visited vertices.

Note:

I spotted that there are two edges from A to B and B to C, but not from C to D. I couldn't understand the pattern here, but if it's the case then it means it requires more operations than 2^N. Well, naive DFS is a bad algorithm anyways, I strongly recommend you to implement the one below.

Optimized Algorithm and Time Complexity

To make your algorithm optimized, you can follow the steps below: 0) Create a boolean array to mark each state as visited when you visit them. Assigned each index to false initially.

  1. Start from a start node specified by us.
  2. In each node, store the adjacent vertices (that are not marked as visited) in a stack. Mark them as visited. Then pop the first element from stack and repeat step 2 until the stack gets empty.
  3. Terminate execution until the stack gets empty.

This time, by storing an array to mark visited nodes, we avoid the phenomenon of visiting same node more than once. This algorithm, traditionally, has the time complexity of O(N + E), N being the number of nodes and E being the number of edges. In your case, your graph seems like a complete graph (if it was undirected though), therefore E ~ N^2, meaning that your time complexity will be O(N^2).

Sign up to request clarification or add additional context in comments.

6 Comments

Thanks for spotting the graph error, corrected
So, extending from your function above, T(1) = T(2) + T(3) + ... + T(n-3) + T(n-2) + T(n-1) + T(n), when you calculate the computational cost, you also have the cost from the first node goes to each node, so it is actually T(1) = 1 + T(2) + 1 + T(3) + .1+.. + 1 + T(n-3) + 1 + T(n-2) + 1 + T(n-1) + 1+ T(n) = n-1 + T(2) + T(3) + ... + T(n-3) + T(n-2) + T(n-1) + T(n) , and this will make this even worse than 2^n
There was a mistake in the notation of my calculation. In the previous version I made T(n) + 0 equals 1, which was incorrect given the fact that one line above I stated that T(n) = 0. That should have been [T(n) + 1] + 0. I corrected the notation. However my result was correct (I implicitly considered the cost from each node to other node). So time complexity of this operation should be in the same class of time complexity 2^n eventually. I hope I'm not missing something at this point.
I think it is a bit different, posted my calculation.
Your calculation is very similar to mine and result turns out to be 2^N. They are the same solutions, I don't see the differences.
|
0

I think I found out.

so if we consider that each new node will add a new edge from the new node to each node in DAG, traversing each of those new edges will take complexity as 1.

Then we could have (if we use C as computational complexity):

C(N) = 1 + C(N-1) + 1 + C(N-2) + 1+ C(N-3) + .... + 1 + C(1)

then you have 

C(N) = 2 * C(N-1) + N - 1
     = 2^2 * C(N-2) + 2 * (N-2) + N - 1
     = 2^3 * C(N-3) + 2^2 * (N-3) + 2 * (N -2) + N - 1
     = 2^(N-1) * C(1) + 2^(N-2) * 1 + ...... + N - 1

At this moment this becomes a sum of the geometric progression of N, the ratio is 2, so the highest rank item is at 2^N.

Thus O(2^N) is the answer.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.