0

I am trying to remove header from given input file. But I couldn't make it. Th is what I have written. Can someone help me how to remove headers from the txt or csv file.

import org.apache.spark.{SparkConf, SparkContext}

object SalesAmount {

  def main(args: Array[String]): Unit = {

    val conf = new SparkConf().setAppName(getClass.getName).setMaster("local")

    val sc = new SparkContext(conf)

    val salesRDD = sc.textFile(args(0),2)

    val salesPairRDD = salesRDD.map(rec => {
      val fieldArr = rec.split(",")
      (fieldArr(1), fieldArr(3).toDouble)
    })

    val totalAmountRDD = salesPairRDD.reduceByKey(_+_).sortBy(_._2,false)



    val discountAmountRDD = totalAmountRDD.map(t => {
      if (t._2 > 1000) (t._1,t._2 * 0.9)
      else t
    })

    discountAmountRDD.foreach(println)


  }

}
3
  • Spark has a built-in CSV file reader in the DataFrame API (Spark SQL) that understands that those files may have headers, so you don't need to parse the text yourself. Also, the RDD API is now deprecated for end-user use and it is recommended that you use the DataFrame API. Commented Nov 4, 2019 at 8:35
  • Thanks for your reply. Now this is not possible using RDD Commented Nov 4, 2019 at 8:57
  • It is possible, of course. Commented Nov 4, 2019 at 9:20

1 Answer 1

1

Skipping the first row when manually parsing text files using the RDD API is a bit tricky:

val salesPairRDD =
  salesRDD
    .mapPartitionsWithIndex((i, it) => if (i == 0) it.drop(1) else it)
    .map(rec => {
      val fieldArr = rec.split(",")
      (fieldArr(1), fieldArr(3).toDouble)
    })

The header line will be the first item in the first partition, so mapPartitionsWithIndex is used to iterate over the partitions and to skip the first item if the partition index is 0.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.