3

When trying to call pyspark dataframe methods, such as show() in the VS Code Debug Console I get an evaluating warning (see quote below). I tried to reproduce this warning, however in different IDEs such as Spyder or PyCharm the Debug Console can call the pyspark dataframe methods.

Evaluating: df.show() did not finish after 3.00 seconds. This may mean a number of things:

  • This evaluation is really slow and this is expected. In this case it's possible to silence this error by raising the timeout, setting the PYDEVD_WARN_EVALUATION_TIMEOUT environment variable to a bigger value.

  • The evaluation may need other threads running while it's running: In this case, it's possible to set the PYDEVD_UNBLOCK_THREADS_TIMEOUT environment variable so that if after a given timeout an evaluation doesn't finish, other threads are unblocked or you can manually resume all threads.

    Alternatively, it's also possible to skip breaking on a particular thread by setting a pydev_do_not_trace = True attribute in the related threading.Thread instance (if some thread should always be running and no breakpoints are expected to be hit in it).

  • The evaluation is deadlocked: In this case you may set the PYDEVD_THREAD_DUMP_ON_WARN_EVALUATION_TIMEOUT environment variable to true so that a thread dump is shown along with this message and optionally, set the PYDEVD_INTERRUPT_THREAD_TIMEOUT to some value so that the debugger tries to interrupt the evaluation (if possible) when this happens.

Has anyone encountered similar warnings when debugging pyspark methods in VS Code and has a suggestion on how to tackle this issue? I also have provided my launch.json settings:

        "type": "python",
        "request": "launch",
        "program": "${file}",
        "env": {"DISPLAY":":1",
                "PYTHONPATH": "${workspaceRoot}",
                "PYDEVD_WARN_EVALUATION_TIMEOUT": "3"},
        "console": "internalConsole"

2 Answers 2

1

Doing F5 (Continue) and going to the next breakpoint/line, while the evaluation is happening in the Debug Console, seems to break the deadlock and the dataframe is getting printed.

Please note that this is observed for smaller dataframes, that do not take much time to evaluate.

Sign up to request clarification or add additional context in comments.

Comments

1

observed this issue on spark local mode(single machine). this issue is not limited to show(). it applies pretty much all df related operation.

the workaround i used is adding below line to your code and set a breakpoint on this line:

for i in range(100): time.sleep(1)

when you are stuck(in deadlock), just press F5 (Continue) and it seems break the deadlock and spit out results in debug console. Debugger then still pauses on that line. You can issue your next statement in debug console and continue to use this trick up to 100 times. Of course you can set it to be more than 100.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.