0

I have a dataframe ; which looks like below

id   1id  id2  ac1  2ac tre tye

I want to delete the columns which contain "id" and "ac" in them and retain the others

How will I achieve this in pyspark?

Tried "select statements" doesn't work

How should I use regexep on column names here?

1 Answer 1

1

Use a simple list comprehension:

  • Using Select

    df.select(*[col(c) for c in df.columns if not("id" in c or "ac" in c)]).show()
    
  • Using Drop

    df.drop(*[c for c in df.columns if "id" in c or "ac" in c]).show()
    
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.