How to work with DataSet in Spark using scala?

Question

I load my CSV using DataFrame then I converted to DataSet but it's shows like this

Multiple markers at this line:
- Unable to find encoder for type stored in a Dataset. Primitive types (Int, String, etc) and Product types (case classes) are supported by importing
spark.implicits._ Support for serializing other types will be added in future releases.
- not enough arguments for method as: (implicit evidence$2:
org.apache.spark.sql.Encoder[DataSet.spark.aacsv])org.apache.spark.sql.Dataset[DataSet.spark.aacsv]. Unspecified value parameter evidence$2

How to resolve this?. My code is -

case class aaCSV(
    a: String, 
    b: String 
    )

object WorkShop {

  def main(args: Array[String]) = {
    val conf = new SparkConf()
      .setAppName("readCSV")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
    val customSchema = StructType(Array(
        StructField("a", StringType, true),
        StructField("b", StringType, true)))

    val df = sqlContext.read.format("com.databricks.spark.csv").option("header", "true").schema(customSchema).load("/xx/vv/ss.csv") 
    df.printSchema()
    df.show()
    val googleDS = df.as[aaCSV]
    googleDS.show()

  }

}

Now I changed main function like this -

def main(args: Array[String]) = {
    val conf = new SparkConf()
      .setAppName("readCSV")
      .setMaster("local")
    val sc = new SparkContext(conf)
    val sqlContext = new SQLContext(sc)
import sqlContext.implicits._;
   val sa = sqlContext.read.csv("/xx/vv/ss.csv").as[aaCSV]
    sa.printSchema()
    sa.show()
}

But it throws error - Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Adj_Close' given input columns: [_c1, _c2, _c5, _c4, _c6, _c3, _c0]; line 1 pos 7. What should i do ?

Now I execute my method using based on given time interval using spark scheduler. But I refer this link - https://spark.apache.org/docs/latest/job-scheduling.html#scheduling-within-an-application. Kindly help us.

"not enough arguments for method"... What method? Where's your code? — OneCricketeer
– OneCricketeer, Commented Oct 17, 2016 at 6:48
Hmm. Please do not use the comments for code. edit your question and format it appropriately. Thanks — OneCricketeer
– OneCricketeer, Commented Oct 17, 2016 at 7:18
@Sarathkumar Vulchi: Can you try adding this line sqlContext.implicits._ before you convert df to ds. — Shankar
– Shankar, Commented Oct 17, 2016 at 9:12
@Shankar: No , I saw your reply then only i tried its working fine. Thanks buddy. — Sarathkumar Vulchi
– Sarathkumar Vulchi, Commented Oct 17, 2016 at 10:03

rishabh.bhardwaj · Accepted Answer · 2016-10-17 10:52:50Z

1

Do you have header (column names) in your csv files ? If yes, try adding .option("header","true") in the read statement. Example: sqlContext.read.option("header","true").csv("/xx/vv/ss.csv").as[aaCSV].

The below blog has different examples for Dataframes and Dataset:http://technippet.blogspot.in/2016/10/different-ways-of-creating.html

answered Oct 17, 2016 at 10:52

rishabh.bhardwaj

3784 silver badges13 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Sarathkumar Vulchi Over a year ago

Thanks buddy. Its Working fine.

Shankar · Accepted Answer · 2016-10-17 09:16:08Z

1

Try adding the below import, before you convert DF to DS.

sc.implicits._

OR

sqlContext.implicits._

For more info on working with DataSet https://spark.apache.org/docs/latest/sql-programming-guide.html#creating-datasets

answered Oct 17, 2016 at 9:16

Shankar

9,02926 gold badges100 silver badges172 bronze badges

2 Comments

Sarathkumar Vulchi Over a year ago

Thanks a lot buddy. I tried another approach that is val sa = sqlContext.read.csv("/home/kenla/Spark_Samples/google.csv").as[googleCSV]

Sarathkumar Vulchi Over a year ago

I tried another approach that is val sa = sqlContext.read.csv("/home/kenla/Spark_Samples/google.csv").as[googleCSV] but throws error "Exception in thread "main" org.apache.spark.sql.AnalysisException: cannot resolve 'Date' given input columns: [_c3, _c4, _c0, _c1, _c5, _c6, _c2]; ". Kindly help us.

Collectives™ on Stack Overflow

How to work with DataSet in Spark using scala?

2 Answers 2

1 Comment

2 Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

1 Comment

2 Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related