1

I tried for grouping by value from raw key value pairs like

[(1, a), (2, a), (3, a), (4, a), (3, b), (1, b), (1, c), (4, c), (4, d)]

I'm able to group by key using groupByKey() method but I cant find way to group by value

a = [1 , 2 , 3 , 4]
b = [3, 1]
c = [1, 4]
d = [4]

I checked in spark API docs but couldnt find any methods

4 Answers 4

2

Spark's RDDs have a groupBy operator where you can pass a custom groupBy function.

data = sc.parallelize([(1, a), (2, a), (3, a), (4, a), (3, b), (1, b), (1, c), (4, c), (4, d)])
data.groupBy(lambda tup: tup[1])

That will group the data by the value (second element of tuple). Note that groupBy and groupByKey can cause out of memory exceptions and are expensive operations. See Avoid GroupByKey

Sign up to request clarification or add additional context in comments.

1 Comment

But shuffling is required for aggregating value unlike sum. Sill this is better since you are not interchanging the tuple :)
1

You can do that by reversing the tuples in RDD,

RDD.map(lambda s: reversed(s))

[(1, a), (2, a),....]

will became

[(a, 1), (a, 2),....]

Now groupByKey().

Though I am not sure about efficiency but it will work :)

Comments

0
input = sc.parallelize([(1,"a"),(2,"a"),(3,"a"),(4,"a"),(1,"b"),(3,"b"),(1,"c"),(4,"c"),(4,"d")])
input.groupByKey().collect()
output1 = input.map(lambda (x,y):(y,x))
 output2 = output1.groupByKey()
output2.collect()

Comments

-2

You can Use this script,

It will group by Value.

vals = [(1, 'a'), (2, 'a'), (3, 'a'), (4, 'a'), (3, 'b'), (1, 'b'), (1, 'c'), (4, 'c'), (4, 'd')]

lst = {}
for k,v in vals:
    if v in lst.keys():
        lst[v] = lst[v] + [k]
    else:
        lst[v] = [k]
print(lst)

3 Comments

I guess traversing the list and using some aggregating function is not a distributed way, which may end up in wrong result in spark.
It won't even work since RDDs are not iterable. On a side note you can simply use: for v, k in vals: lst.setdefault(k, []).append(v)
Not a Spark distributed programming idea.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.