Hare's one approach. First, set up the example:
val prefix = "/home/tmp/date="
val dates = Array("20140901", "20140902", "20140903", "20140904")
val datesRDD = sc.parallelize(dates, 2)
Zipping the the prefix in is easy:
val datesWithPrefixRDD = datesRDD.map(s => prefix + s)
datesWithPrefixRDD.foreach(println)
This produces:
/home/tmp/date=20140901
/home/tmp/date=20140903
/home/tmp/date=20140902
/home/tmp/date=20140904
But you asked for a single string. The obvious first attempt has some comma problems:
val bad = datesWithPrefixRDD.fold("")((s1, s2) => s1 + ", " + s2)
println(bad)
This produces:
, , /home/tmp/date=20140901, /home/tmp/date=20140902, , /home/tmp/date=20140903, /home/tmp/date=20140904
The problem is the way Spark RDD's fold() method starts the concatenation with the empty string I provided, once for the whole RDD and once for each partition. But we can deal with empty strings:
val good = datesWithPrefixRDD.fold("")((s1, s2) =>
s1 match {
case "" => s2
case s => s + ", " + s2
})
println(good)
Then we get:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904
EDIT: Actually, reduce() produces a tidier answer because it solves the "extra comma" problems:
val alternative = datesWithPrefixRDD.reduce((s1, s2) => s1 + ", " + s2)
println(alternative)
Again we get:
/home/tmp/date=20140901, /home/tmp/date=20140902, /home/tmp/date=20140903, /home/tmp/date=20140904