2

I'm new to Spark/Scala/Dataframes. I'm using Scala 2.10.5, Spark 1.6.0. I am trying to load in a csv file and then create a dataframe from it. Using the scala shell I execute the following in the order below. Once I execute line 6, I get an error that says:

error: value show is not a member of org.apache.spark.sql.DataFrameReader

Could someone advise what I might be missing? I understand I don't need to import sparkcontext if I'm using the REPL (shell) so sc will be automatically created, but any ideas what I'm doing wrong?

1.import org.apache.spark.sql.SQLContext

  1. import sqlContext.implicits._

  2. val sqlContext = new SQLContext(sc)

  3. val csvfile = "path_to_filename in hdfs...."

  4. val df = sqlContext.read.format(csvfile).option("header", "true").option("inferSchema", "true")

  5. df.show()

1 Answer 1

6

Try this:

val df = sqlContext.read.option("header", "true").option("inferSchema", "true").csv(csvfile)

sqlContext.read gives you a DataFrameReader, and option and format both set some options and give you back a DataFrameReader. You need to call one of the methods that gives you a DataFrame (like csv) before you can do things like show with it.

See https://spark.apache.org/docs/latest/api/scala/index.html#org.apache.spark.sql.DataFrameReader for more info.

Sign up to request clarification or add additional context in comments.

5 Comments

Thank you!! I'm going to try that now
..I just tried your suggestion and it's giving me the error: value csv is not a member of org.apache.spark.sql.DataFrameReader. Do you think it's b/c I'm not importing something I should?
Ah, looks like that method was added relatively recently. Sorry, don't have a 1.6.0 installation handy to test... maybe try .format("csv").load(csvFile) instead?
omg thank you thank you! I even tried spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 but that didn't work, but your suggestion did. I'm working on a project that's due tomorrow so I hope you wouldn't mind me reaching out again with more questions (if any)!
Actually, to clarify, I also needed to add spark-shell --packages com.databricks:spark-csv_2.10:1.5.0 and then use your suggestion in order for it to work. Thanks again!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.