By following the information you've provided in your question, following can be your solution :
import sqlContext.implicits._
val str1 = "{\"data\":\"abc\", \"field1\":\"def\"}\n{\"data\":\"123\", \"field1\":\"degf\"}\n{\"data\":\"87j\", \"field1\":\"hzc\"}\n{\"data\":\"efs\", \"field1\":\"ssaf\"}"
val str2 = "{\"data\":\"fsg\", \"field1\":\"agas\"}\n{\"data\":\"sgs\", \"field1\":\"agg\"}\n{\"data\":\"sdg\", \"field1\":\"agads\"}"
val input = Seq(str1, str2)
val rddData = sc.parallelize(input).flatMap(_.split("\n"))
.map(line => line.split(","))
.map(array => (array(0).split(":")(1).trim.replaceAll("\\W", ""), array(1).split(":")(1).trim.replaceAll("\\W", "")))
rddData.toDF("data", "field1").show
Edited
You can exclude the fieldNames and just use .toDF() but that would give default column names from your data (like _1 _2 or col_1 col_2 etc)
Instead you can create a schema to create dataframe as below (you can add more fields)
val rddData = sc.parallelize(input).flatMap(_.split("\n"))
.map(line => line.split(","))
.map(array => Row.fromSeq(Seq(array(0).split(":")(1).trim.replaceAll("\\W", ""), array(1).split(":")(1).trim.replaceAll("\\W", ""))))
val schema = StructType(Array(StructField("data", StringType, true),
StructField("field1", StringType, true)))
sqlContext.createDataFrame(rddData, schema).show
Or
You can just create dataset directly but you will need a case class (you can add more fields) as below
val dataSet = sc.parallelize(input).flatMap(_.split("\n"))
.map(line => line.split(","))
.map(array => Dinasaurius(array(0).split(":")(1).trim.replaceAll("\\W", ""),
array(1).split(":")(1).trim.replaceAll("\\W", ""))).toDS
dataSet.show
The case class for above dataset is
case class Dinasaurius(data: String,
field1: String)
I hope I answered all your questions
Charif I doflatMap.