0

I want to transition from Pandas to Polars in a big Python project. Is there a way to highlight or find all Pandas commands I've written in Visual Studio Code (or another IDE if necessary) so I would see what I'd need to edit?

7
  • 2
    Not sure what that would look like, but one way you can figure out what you need to change is by doing a find & replace to remove all pandas imports in your codebase, and then anything that used pandas will light up as now using undefined functions/variables. (You could also do this file by file if you don't want to break everything all at once.) Commented Oct 31, 2024 at 10:13
  • 1
    @AlexDuchnowski That's not a bad solution. Please consider posting that as an answer instead of a comment. Commented Oct 31, 2024 at 14:12
  • 2
    Can anyone explain me, why my question was closed? I don't really understand it. Commented Oct 31, 2024 at 14:36
  • 1
    @Alex Instead of find & replace, you could remove Pandas from the project venv, then the linter would highlight failed imports. Commented Nov 2, 2024 at 21:53
  • 1
    In then I just went through the code line by line, which didn’t take too long. Thanks for your suggestions though :). Commented Nov 9, 2024 at 8:46

1 Answer 1

1

Warning/Tip

Refactoring a codebase to switch dependencies can take more time than expected, and this can often grow exponentially with the size of the codebase. To make sure nothing explodes, make sure you have solid documentation of the functionality of your code (e.g., through comments, docstrings, and a testing suite that shows example input-output pairs) so that as you carry out a controlled burn of your codebase you know what you need to rebuild and how. Also, it's important to ensure that your codebase follows the principles of dependency injection / separation of concerns so that you can break and rebuild one file or module in your code without causing everything else to collapse with it.

General Approach

To figure out which parts of your code depend on Pandas (or any other library, for that matter), you can generally do so by removing access to that library in some way. An IDE like VS Code will then highlight or color-code files that have undefined variables or predictable name errors as well as the specific lines in each file that are detected to be using a variable or method that comes from a library that isn't available. Apart from using the IDE to detect the resulting issues, you could also use a static type-checker like mypy to detect areas of your code that expect Pandas objects but will now be receiving Polars objects instead.

Concrete Methods

One way to remove access/uses of a library is to do a find & replace across a file or entire codebase to remove all instances of import pandas, import pandas as pd, from pandas import DataFrame, etc. If you want to ensure that your project doesn't depend on a package--even implicitly through another library that uses it--then you can uninstall that package (pip uninstall pandas). You could also simply uninstall it from a virtual environment that you're using for development (this is probably preferable to removing it from your machine entirely). The advantage of find & replace is that it can be done one file at a time, but if your codebase is no more than a few (relatively small) files, then you can uninstall right away. Otherwise, you could uninstall after you're convinced you've removed the dependency from every file in the codebase to make sure you're right.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.