I have Python code that works locally and when run on Databricks. For saving the results, a different function is used depending on where to code is run.
On Databricks, several things are automatically initialized when running a notebook; among these is the pyspark library.
Therefore, in order to make my code also work locally, I import pyspark like so:
if "DATABRICKS_RUNTIME_VERSION" in os.environ:
from pyspark.sql import functions as F
def save_results_to_databricks(...):
# do stuff like F.col("relevant")
Now, when writing a test, I would patch F like so:
class TestSaveResultsToDatabricks(unittest.TestCase):
@patch("path.to.module.F")
def test_save_results_to_databricks(self, MockFunctions):
# test stuff
However, this will throw an error:
AttributeError: does not have the attribute 'F'
So, how do I patch a function that is not available locally?
import pyspark.sql.functionsinstead offrom pyspark.sql import functions as F? It's kinda a catch 22.Fisn't in the module being patched until import statement is executed, which happens only when the test runs, and mocking is done before tests start, hence the error you get. Butpyspark.sql.functionsshould be available and patchable before tests are started. See the gotcha in case it's not clear.SparkSessionthat's used during unit tests. Just create afixtureand pass it around.@pytest.fixture(scope='session') def test_spark(): rerurn SparkSession.builder.getOrCreate()