3

I am trying to convert my Java code to scala in Spark, but found it very complicated. Is it possible to convert the following Java code to scala? Thanks!


JavaPairRDD<String,Tuple2<String,String>> newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {

            private static final long serialVersionUID = 1L;

            @Override
            public Tuple2<String, Tuple2<String, String>> call(String t) throws Exception {
                MyPerson p = (new Gson()).fromJson(t, MyPerson.class);

                String nameAgeKey = p.getName() + "_" + p.getAge() ;

                Tuple2<String, String> value = new Tuple2<String, String>(p.getNationality(), t);
                Tuple2<String, Tuple2<String, String>> kvp =
                    new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
                return kvp;
            }
        });

I tried the following, but I am sure I have missed many things. And actually it is not clear to me how to do the override function in scala ... Please suggest or share some examples. Thank you!

val newDataPair = newRecords.mapToPair(new PairFunction<String, String, Tuple2<String, String>>() {

        @Override
        public val call(String t) throws Exception {
            val p = (new Gson()).fromJson(t, MyPerson.class);
            val nameAgeKey = p.getName() + "_" + p.getAge() ;
            val value = new Tuple2<String, String>(p.getNationality(), t);
            val kvp =
                new Tuple2<String, Tuple2<String, String>>(nameAgeKey.toLowerCase(), value);
            return kvp;
        }
    });
2
  • 1
    Please show us what you have tried Commented May 12, 2015 at 0:48
  • This is what I tried: Commented May 12, 2015 at 1:41

1 Answer 1

3

Literal translations from Spark-Java to Spark-Scala typically don't work because Spark-Java introduces many artifacts to cope with the limited type system in Java. Examples in this case: mapToPair in Java is just map in Scala. Tuple2 has a more terse syntax (a,b)

Applying that (and some more) to the snippet:

val newDataPair = newRecords.map{t =>                
    val p = (new Gson()).fromJson(t, classOf[MyPerson])
    val nameAgeKey = p.getName + "_" + p.getAge
    val value = (p.getNationality(), t)
    (nameAgeKey.toLowerCase(), value)
}

It could be made a bit more concise but I wanted to keep the same structure as the Java counterpart to facilitate the understanding of it.

Sign up to request clarification or add additional context in comments.

2 Comments

I got " identifier expected but 'class' found" at MyPerson.class on the line: val p = (new Gson()).fromJson(t, MyPerson.class) ... what am I missing? Thanks!
@Edamame oops - fixed - another java vs scala thing :-) --- btw, I'm not familiar on how Guava's Gson would interop with Scala - might work as-is. Otherwise, you'll have to research further.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.