Let's say I have a Spark 2.x application, which has speculation enabled (spark.speculation=true), which writes data to a specific location on HDFS.
Now if the task (which writes data to HDFS) takes long, Spark would create a copy of the same task on another executor, and both the jobs would be running in parallel.
How does Spark handle this? Obviously both the tasks shouldn't be trying to write data at the same file location at the same time (which seems to be happening in this case).
Any help would be appreciated.
Thanks