1
columnToDelete=[empDFTems2.name,empDFTems.gender]

listjoin = empDFTems.join(empDFTems2, (empDFTems["emp_id"]==empDFTems2["emp_id"]), "left").drop(*columnToDelete)

It wont take the list since its a dataframe name. How can i programmatically drop all columns for the given dataframe after the join, reading which column to drop from the list

4
  • According to the documentation it should work with list of string. That might is the case why your call is not working. spark.apache.org/docs/latest/api/python/reference/api/… Commented Jun 10, 2022 at 5:14
  • trying to make it work with a list of df.names instead of just list of strings since it would delete columns with similar names from the joined dataframe Commented Jun 10, 2022 at 5:16
  • drop() can take both string names and actual columns as in your case. What error are you getting? Commented Jun 10, 2022 at 5:20
  • If you look at the code snipped .drop(*columnToDelete) Im having trouble doing this .drop(*columnToDelete) where columns to delete is a list of dataframe columns names 0 columnToDelete=[empDFTems2.name,empDFTems.gender] Commented Jun 10, 2022 at 6:24

1 Answer 1

1

According to PySpark doc you can only pass list of col names as strings. if you want to pass a column object only a single value is taken .

drop(*cols)

Returns a new DataFrame that drops the specified column. This is a no-op if schema doesn’t contain the given column name(s).

Parameters: cols – a string name of the column to drop, or a Column to drop, or a list of string name of the columns to drop.

I would suggest to select the required columns rather dropping the unnecessary ones. select accepts list of column objects so it should work fine.

empDFTems.show()
+------+-------+------+---+-----+
|emp_id|   name|gender|age| dept|
+------+-------+------+---+-----+
|     1| alis R|     F| 34|   IT|
|     2|Robin M|     M| 44|Sales|
+------+-------+------+---+-----+

empDFTems2.show()
+------+-----+------+-------+------+
|emp_id| name|gender|country|active|
+------+-----+------+-------+------+
|     1| alis|Female|     34|    IT|
|     2|Robin|  Male|     44| Sales|
+------+-----+------+-------+------+

#columnToDelete=[empDFTems2.name,empDFTems.gender]
#lets take delete columns separately if possible 
empDFTems2columnToDelete = ["name"]
empDFTemscolumnToDelete = ["gender"]

selectCols = [empDFTems[i] for i in empDFTems.columns if i not in empDFTemscolumnToDelete] + [empDFTems2[i] for i in empDFTems2.columns if i not in empDFTemscolumnToDelete]

listjoin = empDFTems.join(empDFTems2, (empDFTems["emp_id"]==empDFTems2["emp_id"]), "left").select(selectCols)

listjoin.show()

+------+-------+---+-----+------+------+-------+------+
|emp_id|   name|age| dept|emp_id|gender|country|active|
+------+-------+---+-----+------+------+-------+------+
|     1| alis R| 34|   IT|     1|Female|     34|    IT|
|     2|Robin M| 44|Sales|     2|  Male|     44| Sales|
+------+-------+---+-----+------+------+-------+------+

Alternatively if you must use drop, then the only option is looping over the columns

columnToDelete=[empDFTems2.name,empDFTems.gender]
listjoin = empDFTems.join(empDFTems2, (empDFTems["emp_id"]==empDFTems2["emp_id"]), "left")
for i in columnToDelete:
    listjoin = listjoin.drop(i)
    
listjoin.show()
+------+-------+---+-----+------+------+-------+------+
|emp_id|   name|age| dept|emp_id|gender|country|active|
+------+-------+---+-----+------+------+-------+------+
|     1| alis R| 34|   IT|     1|Female|     34|    IT|
|     2|Robin M| 44|Sales|     2|  Male|     44| Sales|
+------+-------+---+-----+------+------+-------+------+
Sign up to request clarification or add additional context in comments.

1 Comment

Nice. Yeah I was interested in the loop one, since its working. My only question is how does it have the dataframe column context after the join? I know spark does lazy evaluation, but was curious to know how solid this would work? I dont want it unexpectedly dropping columns which it isnt supposed to

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.