0

Im trying to find an exact string match in a dataframe column from employee dataframe

Employee  days_present
Alex      1,2,11,23,
John      21,23,25,28

Need to find which employees are present on 2nd based on days_present column expected output: Alex

below is what i have tried

    df = spark.sql("select * from employee where days_present RLIKE '2')
    df.show()

This returns both Alex & John

Also i would like to find out who are present on 2 & 11, in this case expected ouput is only ALex

1
  • What if we split() the days_present field with ,, and then explode the resulting array field. That way, you'll have a field of days present for all employee. Commented Mar 26, 2020 at 13:33

1 Answer 1

2

We can use array_intersect function starting from Spark-2.4+ and then check the array size if size >=2

Example:

df.show()
+--------+------------+
|Employee|days_present|
+--------+------------+
|    Alex|   1,2,11,23|
|    John| 21,23,25,28|
+--------+------------+
#DataFrame[Employee: string, days_present: string]

df.withColumn("tmp",split(col("days_present"),",")).\
withColumn("intersect",array_intersect(col("tmp"),array(lit("2"),lit("11")))).\
filter(size("intersect") >= 2).\
drop("tmp","intersect").\
show()

#+--------+------------+
#|Employee|days_present|
#+--------+------------+
#|    Alex|   1,2,11,23|
#+--------+------------+

In spark-sql:

df.createOrReplaceTempView("tmp")

spark.sql("""select Employee,days_present from (select *,size(array_intersect(split(days_present,","),array("2","11")))size from tmp)e where size >=2""").show()

#+--------+------------+
#|Employee|days_present|
#+--------+------------+
#|    Alex|   1,2,11,23|
#+--------+------------+
Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.