Spark - CSV text loading parsing error

Question

I am using following code to load a csv file that has text/notes in it.

val data = sqlContext
      .read
      .format("com.databricks.spark.csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .option("parserLib", "UNIVOCITY")
      .load(dataPath)
      .na.drop()

Notes are not in any specific format. During loading I am getting this error:

com.univocity.parsers.common.TextParsingException: Error processing input: null
Identified line separator characters in the parsed content. This may be the cause of the error. The line separator in your parser settings is set to '\n'.

I'd appreciate any help. Thanks.

Do any records from the file span more than one line, i.e. does your data have have \n characters inside what you consider to be "one record"? — radumanolescu
– radumanolescu, Commented Nov 29, 2016 at 19:57
Problem, the way I see it, is with "" or [[ kind of characters. And, yes, there are \n characters. But, I need spark to treat /t/n as one record. Why is it treating \n... — Ahsan
– Ahsan, Commented Nov 29, 2016 at 20:21

Mohammed Rafi · Accepted Answer · 2016-11-30 03:47:23Z

2

I do not have privilege to comment on question, I'm adding answer.

As you are doing na.drop(), may use option("mode", "DROPMALFORMED") as well.

val data = sqlContext
      .read
      .format("com.databricks.spark.csv")
      .option("header", "true")
      .option("inferSchema", "true")
      .option("mode", "DROPMALFORMED")
      .option("parserLib", "UNIVOCITY")
      .load(dataPath)
      .na.drop()

BTW, databricks spark csv is inbuilt in Spark 2.0+

answered Nov 30, 2016 at 3:47

Mohammed Rafi

887 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Spark - CSV text loading parsing error

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related