1

TLDR: Is there a Python library that allows me to get a application window frame as an image and rewrite it to the said application?

So the whole story is that I want to write an application using Python that does something similar to Lossless Scaling and Magpie. I want to grab an application window (a videogame window, for example), get the current frame as an image, then use some Machine Learning/Deep Learning algorithm (like FSR or DLSS) to upscale said image, then rewrite the current frame from the application with said upscaled image.

So far, I have been playing around with some upscaling algorithms like the one from Real-ESRGAN, but now my main problem is how to upscale the video game images in real-time. The only thing I found that does something related to what I need to do is PyAutoGUI. But this package only allows you to take screenshots of an application but not rewrite the graphics of said application.

I hope I have clarified my problem; feel free to comment if you still have any questions.

Thank you for reading this post, and have a good day.

1 Answer 1

1

Doing this with Python is going to be very difficult. A lot of the performance involved in this sort of thing is in avoiding as many memory copies as possible, and Python's idiom for string and bytes processing unfortunately makes quite a few additional copies in the course of any idiomatic program. I say this as a die-hard Python fan who is constantly trying to cram Python in everywhere it doesn't belong: you'd be better off doing this in Rust.

Update: After receiving some feedback from some folks with more direct experience in this sort of thing, I may have overstated the difficulty here. Many ML tools in Python provide zero-copy access, you can easily access and manipulate memory-mapped data from numpy and there is even a CUDA protocol for doing this to data in GPU memory, so while it's not exactly easy, as long as your operations are implemented as numpy operations and not as pure-python pixel-by-pixel logic, it shouldn't be much harder than other python machine learning applications which require access to native APIs for accessing their source data.

However, there's no way to access framebuffer data directly from python, so step 1 is going to be writing your own bindings over the relevant DirectX APIs. Since Magpie is open source, you can see which APIs it's using, for example, in its various C++ "Frame Source" backends. For example, this looks relevant: https://github.com/Blinue/Magpie/blob/42cfcba1222b07e4cec282eaff639aead229f123/Runtime/GraphicsCaptureFrameSource.cpp#L87

You can then look those APIs up on MSDN; that one, for example, is here: https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool.createfreethreaded?view=winrt-22621

CFFI is a good choice for writing native wrappers: https://cffi.readthedocs.io/en/latest/

Gluing these together appropriately is left as an exercise for the reader :).

Sign up to request clarification or add additional context in comments.

3 Comments

Yeah, I imagined that doing something like this in python would be very difficult. But anyway, thanks for your response :)
Oh wow, you didn't need to go so far, but that was interesting! I'm going to take a further look into this. Once again, thank you for your response. I appreciate your effort in a more detailed explanation.
Glad you found it useful; if you do manage to do this, please update / comment so posterity can see what the full solution is like :)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.