Coding a type of random walk in Neo4j using the Traversal Framework

Question

I'm currently working on a graph where nodes are connected via probabilistic edges. The weight on each edge defines the probability of existence of the edge.

Here is an example graph to get you started

(A)-[0.5]->(B)
(A)-[0.5]->(C)
(B)-[0.5]->(C)
(B)-[0.3]->(D)
(C)-[1.0]->(E)
(C)-[0.3]->(D)
(E)-[0.3]->(D)

I would like to use the Neo4j Traversal Framework to traverse this graph starting from (A) and return the number of nodes that have been reached based on the probability of the edges found along the way.

Important:

Each node that is reached can only be counted once. -> If (A) reaches (B) and (C), then (C) need not reach (B). On the other hand if (A) fails to reach (B) but reaches (C) then (C) will attempt to reach (B).
The same goes if (B) reaches (C), (C) will not try and reach (B) again.
This is a discrete time step function, a node will only attempt to reach a neighboring node once.
To test the existence of an edge (whether we traverse it) we can generate a random number and verify if it's smaller than the edge weight.

I have already coded part of the traversal description as follows. (Here it is possible to start from multiple nodes but that is not necessary to solve the problem.)

TraversalDescription traversal = db.traversalDescription()
            .breadthFirst()
            .relationships( Rels.INFLUENCES, Direction.OUTGOING )
            .uniqueness( Uniqueness.NODE_PATH )
            .uniqueness( Uniqueness.RELATIONSHIP_GLOBAL )
            .evaluator(new Evaluator() {

              @Override
              public Evaluation evaluate(Path path) {

                // Get current
                Node curNode = path.endNode();

                // If current node is the start node, it doesn't have previous relationship,
                // Just add it to result and keep traversing
                if (startNodes.contains(curNode)) {
                    return Evaluation.INCLUDE_AND_CONTINUE;
                }
                // Otherwise...
                else {
                  // Get current relationhsip
                  Relationship curRel = path.lastRelationship();

                  // Instantiate random number generator
                  Random rnd = new  Random();

                  // Get a random number (between 0 and 1)
                  double rndNum = rnd.nextDouble();


                  // relationship wc is greater than the random number
                  if (rndNum < (double)curRel.getProperty("wc")) {


                    String info = "";
                    if (curRel != null) {
                        Node prevNode = curRel.getOtherNode(curNode);
                        info += "(" + prevNode.getProperty("name") + ")-[" + curRel.getProperty("wc") + "]->";
                    }
                    info += "(" + curNode.getProperty("name") + ")";
                    info += " :" + rndNum;
                    System.out.println(info);

                    // Keep node and keep traversing
                    return Evaluation.INCLUDE_AND_CONTINUE;
                  } else {

                    // Don't save node in result and stop traversing
                    return Evaluation.EXCLUDE_AND_PRUNE;
                  }
                }
              }
            });

I keep track of the number of nodes reached like so:

long score = 0;
for (Node currentNode : traversal.traverse( nodeList ).nodes())
{
    System.out.print(" <" + currentNode.getProperty("name") + "> ");
    score += 1;
}

The problem with this code is that although NODE_PATH is defined there may be cycles which I don't want.

Therefore, I would like to know:

Is there is a solution to avoid cycles and count exactly the number of nodes reached?
And ideally, is it possible (or better) to do the same thing using PathExpander, and if yes how can I go about coding that?

Thanks

Would NODE_GLOBAL uniqueness work for you? That would satisfy your requirement that each node that is reached can only be counted once. That should also prevent cycles. — InverseFalcon
– InverseFalcon, Commented Mar 21, 2018 at 13:22
@InverseFalcon unfortunately it doesn't work as NODE_GLOBAL seems to define a node as visited even if I set EXCLUDE_AND_PRUNE/ with NODE_GLOBAL it prevents the traverser from attempting to reach a node more than once — David
– David, Commented Mar 22, 2018 at 8:48

David · Accepted Answer · 2018-03-22 13:05:55Z

1

This certainly isn't the best answer.

Instead of iterating on nodes() I iterate on the paths, and add the endNode() to a set and then simply get the size of the set as the number of unique nodes.

HashSet<String> nodes = new HashSet<>();
for (Path path : traversal.traverse(nodeList))
    {
        Node currNode = path.endNode();
        String val = String.valueOf(currNode.getProperty("name"));
        nodes.add(val);
        System.out.println(path);
        System.out.println("");
    }
    score = nodes.size();

Hopefully someone can suggest a more optimal solution.

I'm still surprised though that NODE_PATH didn't not prevent cycles from forming.

answered Mar 22, 2018 at 13:05

David

235 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

InverseFalcon Over a year ago

That's odd...NODE_PATH should definitely be preventing cycles (at least per path). Can you describe how you're defining/detecting such cycles?

InverseFalcon Over a year ago

I think I see the problem. You called uniqueness() twice. Only a single uniqueness is allowed, so you overrode NODE_PATH with RELATIONSHIP_GLOBAL uniqueness, which can definitely cause cycles between already visited nodes (provided there are multiple relationships to reach them and/or leave them)

David Over a year ago

@InverseFalcon it's likely that only one uniqueness is authorized but since I could not find anything in the documentation I wasn't sure. I'll point out here that I did manage to easily code my algorithm with the Neo4J Core API but since it's not the exact answer to this question I did not post it.

Collectives™ on Stack Overflow

Coding a type of random walk in Neo4j using the Traversal Framework

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related