How to implement custom graph clustering algorithm on Spark using GraphFrame?

Ask Question

Asked 3 years, 3 months ago

Modified 3 years, 3 months ago

Viewed 459 times

I have a very large, weighted graph on Azure COSMOS DB. Number of vertices and edges are in billions and size of DB is several TBs. I am trying to cluster the graph on Spark using some custom clustering algorithm.

I understood this can be done using Spark and GraphFrames. I can also find some old algorithm online which uses GraphX and Pregel Framework. But i understand it is better to be implemented in GraphFrames now, for which i am not able to find any examples. I watched several videos, read blogs and could create a small graph and play around with it using GraphFrames (using inbuilt APIs like LPA, BFS, etc)

My Questions:

How to implement graph clustering using GraphFrames? Is there any example a custom graph clustering algorithm using GraphFrames which can run in the distributed fashion? Will just using Graph/Data Frame and writing regular clustering code take care of distrusted processing? or do I have to write in certain way (similar to GraphX or Pregel)?
How do I load the entire graph and run my clustering algorithm. When I load it on GraphFrame, will it load the entire data (several TBs) in memory? Or does it automatically load only that is necessary or should i write some custom code to load what is needed during the processing?

Apologies if the questions are basic, I am new to Spark, Clustering and Graph Frames.

asked Sep 1, 2022 at 12:50

0xcoder

Hey this question looks very involved, do you know what clustering algorithm you want to implement specifically?

FJ_OC
– FJ_OC

2022-09-01 13:40:31 +00:00
Commented Sep 1, 2022 at 13:40
I am still figuring it up. But does that matter? I was expecting any graph algorithm can be implemented using Graph Frames. One possible algorithm is link

0xcoder
– 0xcoder

2022-09-01 19:07:21 +00:00
Commented Sep 1, 2022 at 19:07

Add a comment |

0 Your Answer

Sign up or log in

Post as a guest

Name

Required, but never shown

Post as a guest

Name

Required, but never shown

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.

Collectives™ on Stack Overflow

How to implement custom graph clustering algorithm on Spark using GraphFrame?

0

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

0

Know someone who can answer? Share a link to this question via email, Twitter, or Facebook.

Your Answer

Sign up or log in

Post as a guest