Creating data for Python tests

Question

I have written a module in Python that reads a couple of tables from a database using pd.read_sql method, performs some operations on the data, and writes the results back to the same database using pd.to_sql method.

Now, I need to write unit tests for operations involved in the above mentioned module. As an example, one of the tests would check if the dataframe obtained from the database is empty, another one would check if the data types are correct etc. For such tests, how do I create sample data that reflects these errors (such as empty data frame, incorrect data type)? For other modules that do not read/write from a database, I created a single sample data file (in CSV), read the data, make necessary manipulations and test different functions. For the module related to database operations, how do I (and more importantly where do I) create sample data?

I was hoping to make a local data file (as I did for testing other modules), and then read using read_sql method, but that does not seem possible. Creating a local database using postegresql etc might be possible, but such tests cannot be deployed to clients without requiring them to create the same local databases.

Am I thinking of the problem correctly or missing something?

Thank you

Have you considered using sqlite to create an in memory table for your test? — Balaji Ambresh
– Balaji Ambresh, Commented May 27, 2020 at 5:14

Roy2012 · Accepted Answer · 2020-05-27 05:57:40Z

You're thinking about the problem in the right way. Unit-tests should not rely on the existence of a database, as it makes them slower, more difficult to setup, and more fragile.

There are (at least) three approaches to the challenge you're describing:

The first, and probably the best one in your case, is to leave read_sql and write_sql out of the tested code. Your code should consist of a 'core' function that accepts a data frame and produces another data frame. You can unit-test this core function using local CSV files, or whatever other data you prefer. In production, you'll have another, very simple, function that just creates data using read_sql, pass it to the 'core' function, get the result, and write it using write_sql. You won't be unit-testing this wrapper function - but it's a really simple function and you should be fine.
Use sqlite. The tested function gets a database connection string. In prod, that would be a 'real' database. During your tests, it'll be a lightweight sqlite database that you can keep in your source control or create it as part of the test.
The last option, and the most sophisticated one, is to monkey-patch read_sql and write_sql in your test. I think it's an overkill in this case. Here's how one can do it.

    def my_func(sql, con):
        print("I'm here!")
        return "some dummy dataframe"

    pd.read_sql = my_func
    pd.read_sql("select something ...", "dummy_con")

Collectives™ on Stack Overflow

Creating data for Python tests

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related