0

Below is an example from https://graphframes.github.io/graphframes/docs/_site/user-guide.html

the only thing I confused is the purpose of "lit(0)" from function of condition if this "lit(0)" mean to feed into "cnt"? if yes why is it after ["ab","bc","cd"]?

from pyspark.sql.functions import col, lit, when
from pyspark.sql.types import IntegerType
from graphframes.examples import Graphs
from functools import reduce

chain4 = g.find("(a)-[ab]->(b); (b)-[bc]->(c); (c)-[cd]->(d)")

chain4.show()

sumFriends = lambda cnt,relationship: when(relationship == "friend", cnt+1).otherwise(cnt)

condition = reduce(lambda cnt,e: sumFriends(cnt, col(e).relationship), ["ab", "bc", "cd"], lit(0))

chainWith2Friends2 = chain4.where(condition >= 2)
chainWith2Friends2.show()

1 Answer 1

1

lit(0) is the initializer of the reduce statement. You need to initialize the sumFriends counter with cnt = 0 to start counting.

condition = reduce(lambda cnt,e: sumFriends(cnt, col(e).relationship), ["ab", "bc", "cd"], lit(0))

# should be equivalent to

condition = sumFriends(lit(0), col("ab").relationship)
condition = sumFriends(condition, col("bc").relationship)
condition = sumFriends(condition, col("cd").relationship)
Sign up to request clarification or add additional context in comments.

3 Comments

Thanks for answering, one more question will be how is the function recognise cnt should be assigned by the initialiser ?
@gllow that's how the reduce function was defined in Python. You can have a look at the code example in the linked docs, especially the lines value = initializer and then value = function(value, element).
The initializer is used as the first argument of the provided lambda function.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.