0

I am using following code to load a csv file that has text/notes in it.

val data = sqlContext
      .read
      .format("com.databricks.spark.csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .option("parserLib", "UNIVOCITY")
      .load(dataPath)
      .na.drop()

Notes are not in any specific format. During loading I am getting this error:

com.univocity.parsers.common.TextParsingException: Error processing input: null
Identified line separator characters in the parsed content. This may be the cause of the error. The line separator in your parser settings is set to '\n'. 

I'd appreciate any help. Thanks.

2
  • Do any records from the file span more than one line, i.e. does your data have have \n characters inside what you consider to be "one record"? Commented Nov 29, 2016 at 19:57
  • Problem, the way I see it, is with "" or [[ kind of characters. And, yes, there are \n characters. But, I need spark to treat /t/n as one record. Why is it treating \n... Commented Nov 29, 2016 at 20:21

1 Answer 1

2

I do not have privilege to comment on question, I'm adding answer.

As you are doing na.drop(), may use option("mode", "DROPMALFORMED") as well.

val data = sqlContext
      .read
      .format("com.databricks.spark.csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .option("mode", "DROPMALFORMED")
      .option("parserLib", "UNIVOCITY")
      .load(dataPath)
      .na.drop()

BTW, databricks spark csv is inbuilt in Spark 2.0+

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.