There are two DataFrames. One, df1, contains events, and one of its columns is ID.
The other df2 contains just ID-s.
How would be best to crate df3 which contain just rows whose ID is not present in df2.
Looks like this type of query is not supported in Spark SQL:
sqlContext.sql(""" SELECT * FROM table_df1
WHERE ID NOT IN (SELECT ID FROM table_df2) """)