0

I have a very large, weighted graph on Azure COSMOS DB. Number of vertices and edges are in billions and size of DB is several TBs. I am trying to cluster the graph on Spark using some custom clustering algorithm.

I understood this can be done using Spark and GraphFrames. I can also find some old algorithm online which uses GraphX and Pregel Framework. But i understand it is better to be implemented in GraphFrames now, for which i am not able to find any examples. I watched several videos, read blogs and could create a small graph and play around with it using GraphFrames (using inbuilt APIs like LPA, BFS, etc)

My Questions:

  1. How to implement graph clustering using GraphFrames? Is there any example a custom graph clustering algorithm using GraphFrames which can run in the distributed fashion? Will just using Graph/Data Frame and writing regular clustering code take care of distrusted processing? or do I have to write in certain way (similar to GraphX or Pregel)?

  2. How do I load the entire graph and run my clustering algorithm. When I load it on GraphFrame, will it load the entire data (several TBs) in memory? Or does it automatically load only that is necessary or should i write some custom code to load what is needed during the processing?

Apologies if the questions are basic, I am new to Spark, Clustering and Graph Frames.

2
  • Hey this question looks very involved, do you know what clustering algorithm you want to implement specifically? Commented Sep 1, 2022 at 13:40
  • I am still figuring it up. But does that matter? I was expecting any graph algorithm can be implemented using Graph Frames. One possible algorithm is link Commented Sep 1, 2022 at 19:07

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.