Implementing search functionality in a tree structure with many different types of nodes

Question

I have a tree structure that consists of dozens of types of nodes (each type of node inherits from a NodeBase class).

I would like to perform searches on the tree to return a reference to a specific node. For example, suppose there is some Company tree, which contains Department nodes amongst other types of nodes. Department nodes consist of Employee nodes. It is assumed that an employee must be part of a department, and can be in exactly one department.

Currently, it is designed so that each node has a list of child nodes of type NodeBase. A tree can become quite large, with hundreds of thousands of nodes at times. Insertion/deletion operations are seldom used, while search operations should not take "too long" for these big trees.

Suppose I want to get a reference to an employee node whose employee ID field equals some string that I provide. I don't know which department the employee is in, so I'd have to perform a search through all of the nodes hoping to find a match. Not all nodes have an employee ID field; departments, for example, do not have them.

I am not sure what is the best way to implement the search functionality, given this tree structure design.

There are probably better ways to design how the data is stored in the first place (eg: using a database?) but currently I am stuck with a tree.

The first question is: how important is the search functionality in relation to insertion/deletion? Are you going to do a lot of searches and few modifications? If that is the case AND you need the results faster, you can encapsulate several trees ordering by each of the criteria. — SJuan76
– SJuan76, Commented Jun 13, 2013 at 20:15
And remember that premature optimization is the root of 90,15% of all evil. Complicate it only you need to. — SJuan76
– SJuan76, Commented Jun 13, 2013 at 20:15
Some of the trees can have hundreds of thousands of nodes so it would be useful to implement ways to search quickly. Insertion/deletion will be seldom used. One thing to note is that the nodes are fairly restrictive (eg: only a Company node would ever have Department child nodes), so if I ever need to search for a department, I don't need to search beyond the departments themselves. Maybe this will help decide what kind of implementation would be good. — MxLDevs
– MxLDevs, Commented Jun 13, 2013 at 20:23
I think the Company example is misleading: it's still the very same problem you have with Employee, just that you need to search fewer parents because of the constraint. If your program only needs to search for companies, then it can be helpful; otherwise I think that having separate registers for different kind of nodes is the best way to have good searching performance, especially since updates to the main tree are infrequent — Raffaele
– Raffaele, Commented Jun 13, 2013 at 20:28
I think it means you keep N separate sorted indices for the N types of nodes; you must modify the tree code to update the indices when nodes/subtrees are added/removed to the tree. The searches are performed on the sorted indices - this is also what I wrote in my answer — Raffaele
– Raffaele, Commented Jun 13, 2013 at 20:48

Raffaele · Accepted Answer · 2013-06-13 20:21:07Z

Data structures are the way you organize your data, and the way you organize data depends on how you actually use those pieces of information.

A tree is the right data structure to answer questions like "get all descendents of node X", but doesn't help you to solve the problem of "find me the object with the property X set to Y" (at least not your tree: you could certainly use a tree internally to keep a sorted index as I explain later).

So I think the best way to solve this is using two separate data structures to organize the data: a tree made of NodeBase objects to reflect the hierarchical relationship among NodeBase's, and a sorted index to make the searches with a decent performance. This will introduce a synchronization problem, though, because you'll have to keep the two data structures in sync when nodes are added/removed. If it doesn't happen too frequently, or simply the search performance is critical, then this may be the right way.

Community · Accepted Answer · 2017-05-23 10:25:55Z

1

Assuming that your tree is DAG (directed acyclic tree), use DFS or BFS, for example. Here's a simple BFS:

public NodeBase findEmployee (NodeBase root, Integer employeeId) {
    Queue<NodeBase> q= new LinkedList<NodeBase>();
    q.add(root);
    while (!q.isEmpty()) {
        NodeBase node= q.poll();
        if (node instanceof Employee) {
            if (((Employee)node).getId().equals(employeeId))
                return node;
        }
        for (NodeBase child : node.getChildren())
            q.add(child);
        }
    }
}

EDIT: Visitor pattern

Or as Brabster suggested, you can use a visitor pattern. A NodeBase should implement an accept(IVisitor visitor) method:

public class NodeBase {
    //your code
    public void accept(IVisitor visitor) {
        visitor.visit(this); 
        for (NodeBase node : getChildren()) {
            node.accept(visitor);
        }
    }
}

IVisitor is just an intercace:

public interface IVisitor {
     public void visit(NodeBase node);
}

And you need a proper implementation that will do the search:

public class SearchVisitor implements IVisitor {

     private Integer searchId;

     public SearchVisitor(Integer searchId) {
          this.searchId= searchId;
     }

     @Override
     public void visit(NodeBase node) {
         if (node instanceof Employee) {
             if (((Employee)node).getId().equals(searchId)) {
                  System.out.println("Found the node " + node.toString() + "!");
             }
         }
     }
}

And now, you just simply call it:

NodeBase root= getRoot();
root.accept(new SearchVisitor(getSearchId()));

edited May 23, 2017 at 10:25

CommunityBot

11 silver badge

answered Jun 13, 2013 at 20:17

darijan

9,80327 silver badges38 bronze badges

3 Comments

JAB Over a year ago

Possible improvement: use something like NodeBase.find(NodeFinder callback, Class<T> type) that would allow each node to call themselves with callback, and if that returns false, to call NodeBase.find with the callback on each of their child nodes that have the correct type. Rather than returning null if no object is found, it might be better to have some sort of singleton NotAMatch instance of NodeBase to avoid NullPointerExceptions, but I haven't implemented something like this before so I'm not really sure if it would be necessary.

JAB Over a year ago

Also, perhaps you could take advantage of multicore processors by having each call to find() start a new thread and have a pool for results and just take the first correct result that you get. Or would that produce too much overhead?

MxLDevs Over a year ago

Overhead should not be an issue unless the magnitude of overhead you're thinking of is not the same as what I'm thinking.

01es · Accepted Answer · 2013-06-13 20:46:46Z

It looks like there are two parts to this question -- decomposition of class hierarchies and the implementation of the search algorithm.

In the Java world there are two possible solutions to the problem of decomposition:

Object oriented decomposition, which has a local nature, and
Type checking decomposition using instanceof and type casting.

Functional languages (including Scala) offer pattern matching, which is really a better approach to implement the type checking decomposition.

Due to the fact that there is a need to work with a data structure (tree) where elements (nodes) can be of varying types, the nature of the decomposition is definitely not local. Thus, the second approach is really the only option.

The search itself can be implemented using, for example, binary search tree algorithm. Such tree would need to be constructed out of your data, where the decision where to place a certain node should depend on the actual search criterion. Basically, this means you'd need to have as many trees as there are different search criteria, which is in essence a way to build indexes. The database engines use more sophisticated structures than the binary search tree. For example, red-black trees, but the idea is very similar.

BTW the binary search tree would have a homogeneous nature. For example, if the search pertains to Employee by Department, then the search tree would consist only of nodes associated with Employee instances. This removes the decomposition problem.

Collectives™ on Stack Overflow

Implementing search functionality in a tree structure with many different types of nodes

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

3 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Related