0

How can I convert RDD[String] and Array[String] to String?

I am getting the below error,

<console>:34: error: type mismatch;
found   : org.apache.spark.rdd.RDD[String]
required: String

The idea is to get the distinct date from a column in SchemaRDD and concat the date with a constant String as /home/tmp/date=. So I should concat both and output should be

val path =  /home/tmp/date=20140901,/home/tmp/date=20140902,/home/tmp/date=20140903,/home/tmp/date=20140904,... so on

path will be keyed in sc.textFiles(path) to read the entire dataset.

At this step, while reading the data I get the conversion error.

1 Answer 1

4

Hare's one approach. First, set up the example:

val prefix = "/home/tmp/date="
val dates =  Array("20140901", "20140902", "20140903", "20140904")
val datesRDD = sc.parallelize(dates, 2)

Zipping the the prefix in is easy:

val datesWithPrefixRDD = datesRDD.map(s => prefix + s)
datesWithPrefixRDD.foreach(println)

This produces:

/home/tmp/date=20140901
/home/tmp/date=20140903
/home/tmp/date=20140902
/home/tmp/date=20140904

But you asked for a single string. The obvious first attempt has some comma problems:

val bad = datesWithPrefixRDD.fold("")((s1, s2) => s1 + ", " + s2)
println(bad)

This produces:

, , /home/tmp/date=20140901, /home/tmp/date=20140902, , /home/tmp/date=20140903, /home/tmp/date=20140904

The problem is the way Spark RDD's fold() method starts the concatenation with the empty string I provided, once for the whole RDD and once for each partition. But we can deal with empty strings:

val good = datesWithPrefixRDD.fold("")((s1, s2) =>
  s1 match {
    case "" => s2
    case s => s + ", " + s2
  })
println(good)

Then we get:

/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904

EDIT: Actually, reduce() produces a tidier answer because it solves the "extra comma" problems:

val alternative = datesWithPrefixRDD.reduce((s1, s2) => s1 + ", " + s2)
println(alternative)

Again we get:

/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.