2

I have a unit-test (using PyTest) that runs my PySpark tests. I have the normal conftest.py that creates SQLContext. I would like to get the same uuid4 in all cases, so I patched uuid4 in my test. If I call uuid.uuid4() from the test funnction, all is good.

However, when I run the PySpark job, that also calls uuid4, it is not patched:

My PySpark function (simplified):

def create_uuid_if_needed(current, prev):
    if current > prev:
        return str(uuid.uuid4())
    else:
        return None


def my_df_func(df):
    my_udf = udf(create_uuid_if_needed, T.StringType())    
    my_window = Window.partitionBy(F.col(PARTITIONING_KEY)).orderBy(F.col(ORDER))
    return df.withColumn('new_col', my_udf(df.col, F.lag(df.col, 1)).over(my_window))

My test looks like this:

@patch.object(uuid, 'uuid4', return_value='1-1-1-1')
def test_add_activity_period_start_id(mocker, sql_context, input_fixture):
    input_df = sql_context.createDataFrame(input_fixture, [... schema...])    
    good_uuid = str(uuid.uuid4())
    another_goood_uuid = create_uuid_if_needed(2, 1)
    actual_df = my_df_func(input_df)
    ...

The good_uuid gets the correct value - '1-1-1-1', and so is the another_good_uuid but the dataframe's udf version of the function still calls the non patched uuid4.

What is wrong here? Is it something that the udf() function is doing? Thanks!

7
  • Can't you just return the string '1-1-1-1' instead of patching? Anyhow you are using it as a function decorator here instead try using it as test class decorator if you want the patch to work every where Commented Apr 10, 2019 at 14:27
  • I can't just return 1-1-1-1, as in prod it should generate a uuid. There is no test class here, just a test function (PyTest) I can put it in conftest.py, but I just simplified the code to make it clear Commented Apr 10, 2019 at 14:44
  • Your code is going to behave different in production than in development? This unit test seems useless, you could add a isUnitTest parameter to create function with default value of False and return the string '1-1-1-1' when unit testing but it makes no sense Commented Apr 10, 2019 at 15:16
  • Disabling randomization in unit test is pretty standard. Adding isUnitTest is tainting production code. This is one of the reasons why patching exist. Also this is a strip down of the test, to focus on the problem, not the whole test. Commented Apr 10, 2019 at 16:26
  • I am not sure about disabling randomization but you can control it, that is why they have tools like Faker. Anyway here is how you mock random uuid stackoverflow.com/questions/41186818/… Commented Apr 12, 2019 at 23:19

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.