I am new to Spark/Scala. I have a master data frame which consists of over 100 million records
+--------+
| ttm_id|
+--------+
|39622109|
|39622178|
|39578322|
+--------+
And a changelist DataFrame which has around 40 million records
+----------+--------+
|__change__| ttm_id|
+----------+--------+
| DELETE|18001570|
| DELETE| 50520|
| DELETE| 144440|
| DELETE| 93130|
| DELETE| 93140|
+----------+--------+
How would I go about comparing these two data frames so that:
If __change__ = DELETE and masterlist.ttm_id = changeset.ttm_id then remove matching ttm_id record from the Masterlist
Thanks!