Searching for substring across multiple columns

Question

I am trying to find a substring across all columns of my spark dataframe using PySpark. I currently know how to search for a substring through one column using filter and contains:

df.filter(df.col_name.contains('substring'))

How do I extend this statement, or utilize another, to search through multiple columns for substring matches?

pissall · Accepted Answer · 2019-10-16 20:47:43Z

3

You can generalize the statement the filter in one go:

from pyspark.sql.functions import col, count, when
# Converts all unmatched filters to NULL and drops them.
df = df.select([when(col(c).contains('substring'), col(c)).alias(c) for c in df.columns]).na.drop()

OR

You can simply loop over the columns and apply the same filter:

for col in df.columns:
    df = df.filter(df[col].contains("substring"))

answered Oct 16, 2019 at 20:47

pissall

7,4442 gold badges29 silver badges47 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Retko · Accepted Answer · 2023-02-09 09:15:03Z

1

You can search through all columns and fill next dataframe and union results, like this:

columns = ["language", "else"]
data = [
    ("Java", "Python"),
    ("Python", "100000"),
    ("Scala", "3000"),
]
df = spark.createDataFrame(data).toDF(*columns)
df.cache()
df.show()

schema = df.schema
df2 = spark.createDataFrame(data=[], schema=schema)

for col in df.columns:
    df2 = df2.unionByName(df.filter(df[col].like("%Python%")))

df2.show()
+--------+------+
|language|  else|
+--------+------+
|  Python|100000|
|    Java|Python|
+--------+------+

Result will contain first 2 rows, because they have value 'Python' in some of the columns.

answered Feb 9, 2023 at 9:15

Retko

4101 gold badge7 silver badges14 bronze badges

1 Comment

John Doe Over a year ago

This one is working for me! df = df.filter(df[col].contains("substring")) is not giving any results even when changing it to df = df.filter(df[col].like("%substring%")). What is the difference with unionByName?

Collectives™ on Stack Overflow

Searching for substring across multiple columns

2 Answers 2

Comments

1 Comment

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

2 Answers 2

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Linked

Related