0

Using apache-spark to process data.

Given such scala codes:

val rdd1 = sc.cassandraTable("player", "playerinfo").select("key1", "value")
val rdd2 = rdd1.map(row => (row.getString("key1"), row.getLong("value")))

Basically, it covert a RDD 'rdd1' to another RDD 'rdd2', but it stores 'rdd1' as key-value pair form.

Pay attention that the source data is from cassandra and keys1 is a part of composite key and value is the value.

Then how to convert this into Java so that I will have a JavaPairRDD<String,Long> using spark Java API? I already have an cassandraRowsRDD generated successfully from the Java codes below:

  JavaRDD<String> cassandraRowsRDD = javaFunctions(sc).cassandraTable("player", "playerinfo")
            .map(new Function<CassandraRow, String>() {
                @Override
                public String call(CassandraRow cassandraRow) throws Exception {
                    return cassandraRow.toString();
                }
            });

1 Answer 1

2

CassandraJavaRDD inherits mapToPair methods. You can call it to get key-value pair RDD in Java.

    JavaPairRDD<String, String> cassandraKeyValuePairs = javaFunctions(sc).cassandraTable("player", "playerinfo").mapToPair(
            new PairFunction<CassandraRow, String, String>() {
                @Override
                public Tuple2<String, String> call(CassandraRow row) throws Exception {
                    return new Tuple2(row.getString("key1"), row.getLong("value"));
                }
            }
    );

You can also call the function on your cassandraRowsRDD.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.