1

Rdd consists of entire csv records and not able to find ways to exclude particular colums from it.

Am tried drop().

For example CSV File consists of three columns no,name and age.

Now I need to exclude 2 columns no and name

Val excluColumns='no,name'    
rdd.drop(excluColumns)

Makes Issue in code.

Am new to this spark and anyone guide me to do this.

EDIT-1

val cols="no,name"
val excluColumns= Seq(cols)
df.drop(excluColumns:_*)
  .show()

It leads conversion issue.

4
  • you have rdd or dataframe? Commented Apr 18, 2018 at 6:51
  • am having rdd in which having entire records Commented Apr 18, 2018 at 6:52
  • can you share how you created the rdd? rdds don't have column names Commented Apr 18, 2018 at 6:53
  • My rdd to be like this .rdd=spark.read("CSV File") Commented Apr 18, 2018 at 6:56

2 Answers 2

3

RDDs don't have column names so you will have to read it as dataframe and use drop as (assuming that you have header in the file)

the file should be as

no,name,age
1,bill,23
2,charles,24
3,gates,45

You read it to dataframe as

val df = sqlContext.read.format("com.databricks.spark.csv").option("header", true).load("File.csv")

which should give you

+---+-------+---+
|no |name   |age|
+---+-------+---+
|1  |bill   |23 |
|2  |charles|24 |
|3  |gates  |45 |
+---+-------+---+

Then you can create sequence of columns to be dropped and use it as below

val excluColumns= "no,name".split(",")
df.drop(excluColumns:_*)
  .show()

This should give you age column only

+---+
|age|
+---+
| 23|
| 24|
| 45|
+---+
Sign up to request clarification or add additional context in comments.

2 Comments

Like i have specified the column names in string "val" in my question.How i pass those string Seq() method.it makes issue
i have added my EDIT-1 in question. Also, my column names comes in string only not in seperation.
1
  StringWriter sw = new StringWriter();
                sw.WriteLine("\"Id No\",\"Customer Name\",\"Customer Mobile No\",\"Customer BusinessZone\"");
                Response.ClearContent();
                Response.AddHeader("content-disposition", "attachment;filename=Security_User.csv");
                Response.ContentType = "text/csv";
                foreach (var user in _securityUserService.GetAllCustomer())
                {
                    sw.WriteLine(string.Format("\"{0}\",\"{1}\",\"{2}\",\"{3}\"",
                                               user.Id,
                                               user.FullName,
                                               user.Phone,
                                               user.BusinessZones.Name));
                }

                Response.Write(sw.ToString());

                Response.End();
            }

1 Comment

Is this possible in spark ?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.