20

What would be the efficient algorithm to find if two given binary trees are equal - in structure and content?

4
  • 1
    Meaning they are similar or not ? Commented Sep 27, 2009 at 5:08
  • 1
    Does equal mean structure-wise or/and value-wise? Commented Sep 27, 2009 at 5:22
  • Structure-Wise and Value-Wise. Commented Sep 27, 2009 at 5:23
  • 1
    Two nodes are equal if their data contents are the same and their left children are equal and their right children are equal. That's all you really need to know. Commented Oct 12, 2011 at 1:39

7 Answers 7

24

It's a minor issue, but I'd adapt the earlier solution as follows...

eq(t1, t2) =
  t1.data=t2.data && eq(t1.left, t2.left) && eq(t1.right, t2.right)

The reason is that mismatches are likely to be common, and it is better to detect (and stop comparing) early - before recursing further. Of course, I'm assuming a short-circuit && operator here.

I'll also point out that this is glossing over some issues with handling structurally different trees correctly, and with ending the recursion. Basically, there need to be some null checks for t1.left etc. If one tree has a null .left but the other doesn't, you have found a structural difference. If both have null .left, there's no difference, but you have reached a leaf - don't recurse further. Only if both .left values are non-null do you recurse to check the subtree. The same applies, of course, for .right.

You could include checks for e.g. (t1.left == t2.left), but this only makes sense if subtrees can be physically shared (same data structure nodes) for the two trees. This check would be another way to avoid recursing where it is unnecessary - if t1.left and t2.left are the same physical node, you already know that those whole subtrees are identical.

A C implementation might be...

bool tree_compare (const node* t1, const node* t2)
{
  // Same node check - also handles both NULL case
  if (t1 == t2)  return true;

  // Gone past leaf on one side check
  if ((t1 == NULL) || (t2 == NULL))  return false;

  // Do data checks and recursion of tree
  return ((t1->data == t2->data) && tree_compare (t1->left,  t2->left )
                                 && tree_compare (t1->right, t2->right));
}

EDIT In response to a comment...

The running time for a full tree comparison using this is most simply stated as O(n) where n is kinda the size of a tree. If you're willing to accept a more complex bound you can get a smaller one such as O(minimum(n1, n2)) where n1 and n2 are the sizes of the trees.

The explanation is basically that the recursive call is only made (at most) once for each node in the left tree, and only made (at most) once for each node in the right tree. As the function itself (excluding recursions) only specifies at most a constant amount of work (there are no loops), the work including all recursive calls can only be as much as the size of the smaller tree times that constant.

You could analyse further to get a more complex but smaller bound using the idea of the intersection of the trees, but big O just gives an upper bound - not necessarily the lowest possible upper bound. It's probably not worthwhile doing that analysis unless you're trying to build a bigger algorithm/data structure with this as a component, and as a result you know that some property will always apply to those trees which may allow you a tighter bound for the larger algorithm.

One way to form a tigher bound is to consider the sets of paths to nodes in both trees. Each step is either an L (left subtree) or an R (right subtree). So the root is specified with an empty path. The right child of the left child of the root is "LR". Define a function "paths (T)" (mathematically - not part of the program) to represent the set of valid paths into a tree - one path for every node.

So we might have...

paths(t1) = { "", "L", "LR", "R", "RL" }
paths(t2) = { "", "L", "LL", "R", "RR" }

The same path specifications apply to both trees. And each recursion always follows the same left/right link for both trees. So the recursion visits the paths in the itersection of these sets, and the tightest bound we can specify using this is the cardinality of that intersection (still with the constant bound on work per recursive call).

For the tree structures above, we do recursions for the following paths...

paths(t1) intersection paths(t2) = { "", "L", "R" }

So our work in this case is bounded to at most three times the maximum cost of non-recursive work in the tree_compare function.

This is normally an unnecessary amount of detail, but clearly the intersection of the path-sets is at most as large as the number of nodes in the smallest original tree. And whether the n in O(n) refers to the number of nodes in one original tree or to the sum of the nodes in both, this is clearly no smaller than either the minimum or our intersection. Therefore O(n) isn't such a tight bound, but it's still a valid upper bound, even if we're a bit vague which size we're talking about.

Sign up to request clarification or add additional context in comments.

8 Comments

@Steve: Thank you Steve for precise explanation.
Is the running time of this algorithm O(n^2)?
@Steve314 when the first nodes of the trees are same, the algorithm returns true and the comparison stops ?(please correct me if i am wrong).
@buzzinga - if the root nodes are both the same node, referenced through the same pointer value, then the algorithm correctly returns true and comparison correctly stops. If the root nodes are both the same node, then the trees are both the same tree. The values within the trees must be the same because they're the same tree.
@Steve314 now since the comparison stops, how will the rest of the nodes be compared? and if not compared, how will it be proved that two trees are isomorphic or not?
|
4

Modulo stack overflow, something like

eq(t1, t2) =
    eq(t1.left, t2.left) && t1.data=t2.data && eq(t1.right, t2.right)

(This generalizes to an equality predicate for all tree-structured algebraic data types - for any piece of structured data, check if each of its sub-parts are equal to each of the other one's sub-parts.)

1 Comment

Thank you Brian for giving apt explanation.
3

We can also do any of the two traversals (pre-order, post-order or in-order) and then compare the results of both the trees. If they are same, we can be sure of their equivalence.

1 Comment

It will also work but it's more costly , because you will have to travel each tree twice i.e. O(4n) + O(2n) to go over their traversals. Also if trees are big enough but have different sets of nodes in early levels, you will still have to traverse them till the end instead of breaking out.
1

A more general term for what you are probably trying to accomplish is graph isomorphism. There are some algorithms to do this on that page.

1 Comment

FWIW, tree isomorphism can be checked in linear time.
1

Since it's a proven fact that - it is possible to recreate a binary tree as long as we have the following:

  1. The sequence of nodes that are encountered in an In-Order Traversal.
  2. The sequence of nodes that are encountered in a Pre-Order OR Post-Order Traversal

If two binary trees have the same in-order and [pre-order OR post-order] sequence, then they should be equal both structurally and in terms of values.

Each traversal is an O(n) operation. The traversals are done 4 times in total and the results from the same-type of traversal is compared. O(n) * 4 + 2 => O(n) Hence, the total order of time-complexity would be O(n)

Comments

0

I would write it as follows. The following code will work in most functional language, and even in python if your datatypes are hashable (e.g. not dictionaries or lists):

  • topological equality (same in structure, i.e. Tree(1,Tree(2,3))==Tree(Tree(2,3),1)):

    tree1==tree2 means set(tree1.children)==set(tree2.children)

  • ordered equality:

    tree1==tree2 means tree1.children==tree2.children

(Tree.children is an ordered list of children)

You don't need to handle the base cases (leaves), because equality has been defined for them already.

3 Comments

Could you elaborate how set(tree1.children)==set(tree2.children) would work on Tree(0,Tree(1,Tree(2,3)))==Tree(0,Tree(Tree(1,3),2))?
@J.F.Sebastian: it would work correctly (i.e. return the result "not equal") as follows: Is 0==0 and (1,(2,3))==((1,3),2)? Let's first check if (1,(2,3))==((1,3),2). Is 1==(1,3) and (2,3)==2? No, because a tree is not equal to an atom. Therefore it is not true that (1,(2,3))==((1,3),2). Therefore it is not true that (0,(1,(2,3)))==(0,((1,3),2)). The idea is that set equality is recursively defined on the equality of its elements, and therefore an equality check will innately be recursive.
why do you skip the 1==2 and (2,3)==(1,3) check? Does your set() definition implies a certain order for elements? Otherwise it is very expensive as an implementation. It might be ok as a mere definition (not implying any implementation) though the word topological implies for me that nodes values are not important and only the shape of the trees matters e.g., to a topologist, a coffee cup and a donut are the same thing
0
bool identical(node* root1,node* root2){
    if(root1 == NULL && root2 == NULL)
        return true;
    if(root1==NULL && root2!=NULL || root1!=NULL && root2 == NULL)
        return false;
    if(root1->data == root2->data){
        bool lIdetical = identical(root1->left,root2->left);
        if(!lIdentical)
            return false;
        bool rIdentical = identical(root1->right,root2->identical);
        return lIdentical && rIdentical;
    }
    else{
        printf("data1:%d vs data2:%d",root1->data,root2->data);
        return false;
    }
}

I do not know if this is the most effecient but I think this works.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.