Pyspark: Select all columns except particular columns

Question

I have a large number of columns in a PySpark dataframe, say 200. I want to select all the columns except say 3-4 of the columns. How do I select this columns without having to manually type the names of all the columns I want to select?

df.select([c for c in df.columns if c not in {'GpuName','GPU1_TwoPartHwID'}]) — vvg
– vvg, Commented Jun 13, 2018 at 14:18
Possible duplicate of How to exclude multiple columns in Spark dataframe in Python — vvg
– vvg, Commented Jun 13, 2018 at 14:18

Tshilidzi Mudau · Accepted Answer · 2018-12-06 04:37:00Z

71

In the end, I settled for the following :

Drop:

df.drop('column_1', 'column_2', 'column_3')
Select :

df.select([c for c in df.columns if c not in {'column_1', 'column_2', 'column_3'}])

edited Dec 6, 2018 at 4:37

answered Sep 4, 2018 at 7:05

Tshilidzi Mudau

7,9496 gold badges39 silver badges51 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Kishore Arul Feb 9 at 17:36

this works like charm

rwitzel · Accepted Answer · 2023-09-05 20:18:59Z

7

PySpark SQL: "SELECT * except(col6, col7, col8)"

answered Sep 5, 2023 at 20:18

rwitzel

1,86020 silver badges24 bronze badges

1 Comment

anth0ny-x Over a year ago

Exactly what I'm looking for and works as expected. Looks like it's widely implemented as well. Can't believe I've never came across this.

sairamdgr8 · Accepted Answer · 2022-09-09 15:51:15Z

4

this might be helpful

df_cols = list(set(df.columns) - {'<col1>','<col2>',....})

df.select(df_cols).show()

answered Sep 9, 2022 at 15:51

sairamdgr8

772 silver badges13 bronze badges

Comments

martand · Accepted Answer · 2021-09-13 17:04:51Z

3

df.drop(*[cols for cols in [list of columns to drop]])

Useful if the list to drop columns is huge. or if the list can be derived programmatically.

answered Sep 13, 2021 at 17:04

martand

334 bronze badges

Comments

Dhruv · Accepted Answer · 2025-02-10 06:52:27Z

2

Another way to do it in pyspark:

import pyspark.sql.functions as F

df.select(F.expr("* EXCEPT( COLUMN 1, COLUMN 2)"))

or

df.selectExpr("* EXCEPT( COLUMN 1, COLUMN 2))

edited Feb 10 at 6:52

answered Feb 9 at 15:54

Dhruv

5973 silver badges27 bronze badges

Collectives™ on Stack Overflow

Pyspark: Select all columns except particular columns

5 Answers 5

1 Comment

1 Comment

this might be helpful

Comments

Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

5 Answers 5

1 Comment

1 Comment

this might be helpful

Comments

Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related