3

I have a dataset that looks like below -

0 -- 1,2,4
1 -- 0,4
2 -- 0,4
4 -- 2,1,0

I want to read each line and transform it to something that looks like below

// for the line 0 -- 1,2,4
(0,1) <2,4>
(0,2) <1,4>
(0,4) <1,2>

// for the line 1 -- 0,4
(0,1) <4>
(1,4) <0>

// smaller number appeears first in the pair always

i.e., read each line separated on "--" delimiter. So I get 0 and 1,2,4 from line 1 of the dataset. After that, I want create pairs. For example, (0,1) which will be the key for the transformed map and its value should be 2,4.

Once this is done, I want to be able to group values by key

For example (0,1) <2,4> <4>

and intersect them to get 4.

Is it possible to do something like this? Is my approach right?

I have written the below code so far-

var mapOperation = logData.map(x=>x.split("\t")).filter(x => x.length == 2).map(x => (x(0),x(1)))
// reading file and creating the map Example - key 0 value 1,2,4

//from the first map, trying to create pairs
var mapAgainstValue = mapOperation.map{
line =>
val fromFriend = line._1
val toFriendsList = line._2.split(",")
(fromFriend -> toFriendsList)
}

val newMap = mapAgainstValue.map{
line =>
var key ="";
for(userIds <- line._2){
key =line._1.concat(","+userIds);
(key -> line._2.toList)
}

}

The problem is I am not able to call groupByKey on newMap. I am assuming there is some issue with the way I have created the map?

Appreciate any help.

Thanks.

2
  • Please edit the question and add some more detail as I am not able to figure out what you actuallly want Commented Oct 22, 2016 at 14:25
  • I edited the question. Hope its better than before. Commented Oct 22, 2016 at 14:53

1 Answer 1

6

Your problem can be solved like this :

 val inputRDD=sc.textFile("inputFile.txt")  
inputRDD.flatMap{a=>
          val list=a.split("--")
          val firstTerm=list(0)
          val secondTermAsList=list(1).split(",")
          secondTermAsList.map{b=>
          val key=if(b>firstTerm) (firstTerm,b) else (b,firstTerm)
          val value=secondTermAsList diff List(b)
          (key,value)
          }
          }

This code results in this output :

+-----+------+
|_1   |_2    |
+-----+------+
|[0,1]|[2, 4]|
|[0,2]|[1, 4]|
|[0,4]|[1, 2]|
|[0,1]|[4]   |
|[1,4]|[0]   |
|[0,2]|[4]   |
|[2,4]|[0]   |
|[2,4]|[1, 0]|
|[1,4]|[2, 0]|
|[0,4]|[2, 1]|
+-----+------+

I hope this solves your issue !

Sign up to request clarification or add additional context in comments.

6 Comments

That is real good piece of code. Thanks a lot. So, from here I should be able to do a group by key and intersect the value sets.
Just one more thing, where should I add the filter if not every element in the dataset has values after the delimiter -- ?
Check if list size after separation is 1 . If it is just give the default Value or else do the computation
I am unable to add an if condition like this - val x = inputRDD.flatMap { a => val list = a.split("\t") if (list.length > 1) { val firstTerm = list(0) val secondTermAsList = list(1).split(",") secondTermAsList.map { b => val key = if (b.toInt > firstTerm.toInt) (firstTerm, b) else (b, firstTerm) val value = secondTermAsList diff List(b) (key, value) } } }
You have to evaluate if as an expression
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.