0

I am new to Scala and Spark. I am trying to do some simple program where I want to remove a row which has Empty/NUll values(without using DataFrame). I tried to do it with filter but it's not working. Can you please tell where I am making the mistake ?

Data:

Bypass Road (film),2019,137,Drama|Thriller,7.1,51
Satellite Shankar,2019,135,Action|Drama,4.6,34
Jhalki,2019,0,Drama,,
Marjaavaan,2019,0,Action|Romance,,
Motichoor Chaknachoor,2019,150,Comedy|Romance,,
Keep Safe Distance (film),2019,0,Action|Thriller,,

I am trying to remove rows with empty value like Keep Safe Distance (film),2019,0,Action|Thriller,,

Code:

val sparkSession = SparkSession.builder().appName("MovieAnalyzer")
      .master("local")
      .getOrCreate()

    val dataRDD = sparkSession.sparkContext.textFile("src/test/resources/movies.csv")

    // remove the header
    val header = dataRDD.first()
    val movie_list = dataRDD.filter(line => line != header).filter(r => !r.contains("")).map(r => r.replace("\\N","0"))
    movie_list.collect().foreach(println)

The above code is not printing any data from the csv file. Please let me know what is the problem with my code

1 Answer 1

1

The problem is with .filter(r => !r.contains(""))

you should split these values and then perform check :

dataRDD.filter(s=> !s.split(",",-1).contains("")).foreach(println(_))

so output will be:

Bypass Road (film),2019,137,Drama|Thriller,7.1,51 Satellite Shankar,2019,135,Action|Drama,4.6,34

If RDD is not required I would use DataFrame and then DataFrameNaFunctions like below:

val df = ss.read.csv("data/movies.csv")
df.na.drop().show();

output:

+------------------+----+---+--------------+---+---+
|               _c0| _c1|_c2|           _c3|_c4|_c5|
+------------------+----+---+--------------+---+---+
|Bypass Road (film)|2019|137|Drama|Thriller|7.1| 51|
| Satellite Shankar|2019|135|  Action|Drama|4.6| 34|
+------------------+----+---+--------------+---+---+
Sign up to request clarification or add additional context in comments.

1 Comment

thanks man it worked. can you please explain me why code didnt how. what's wrong with my code ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.