Manipulate an application window frame using Python

Question

TLDR: Is there a Python library that allows me to get a application window frame as an image and rewrite it to the said application?

So the whole story is that I want to write an application using Python that does something similar to Lossless Scaling and Magpie. I want to grab an application window (a videogame window, for example), get the current frame as an image, then use some Machine Learning/Deep Learning algorithm (like FSR or DLSS) to upscale said image, then rewrite the current frame from the application with said upscaled image.

So far, I have been playing around with some upscaling algorithms like the one from Real-ESRGAN, but now my main problem is how to upscale the video game images in real-time. The only thing I found that does something related to what I need to do is PyAutoGUI. But this package only allows you to take screenshots of an application but not rewrite the graphics of said application.

I hope I have clarified my problem; feel free to comment if you still have any questions.

Thank you for reading this post, and have a good day.

Glyph · Accepted Answer · 2023-01-24 20:14:32Z

1

Doing this with Python is going to be very difficult. A lot of the performance involved in this sort of thing is in avoiding as many memory copies as possible, and Python's idiom for string and bytes processing unfortunately makes quite a few additional copies in the course of any idiomatic program. I say this as a die-hard Python fan who is constantly trying to cram Python in everywhere it doesn't belong: you'd be better off doing this in Rust.

Update: After receiving some feedback from some folks with more direct experience in this sort of thing, I may have overstated the difficulty here. Many ML tools in Python provide zero-copy access, you can easily access and manipulate memory-mapped data from numpy and there is even a CUDA protocol for doing this to data in GPU memory, so while it's not exactly easy, as long as your operations are implemented as numpy operations and not as pure-python pixel-by-pixel logic, it shouldn't be much harder than other python machine learning applications which require access to native APIs for accessing their source data.

However, there's no way to access framebuffer data directly from python, so step 1 is going to be writing your own bindings over the relevant DirectX APIs. Since Magpie is open source, you can see which APIs it's using, for example, in its various C++ "Frame Source" backends. For example, this looks relevant: https://github.com/Blinue/Magpie/blob/42cfcba1222b07e4cec282eaff639aead229f123/Runtime/GraphicsCaptureFrameSource.cpp#L87

You can then look those APIs up on MSDN; that one, for example, is here: https://learn.microsoft.com/en-us/uwp/api/windows.graphics.capture.direct3d11captureframepool.createfreethreaded?view=winrt-22621

CFFI is a good choice for writing native wrappers: https://cffi.readthedocs.io/en/latest/

Gluing these together appropriately is left as an exercise for the reader :).

edited Jan 24, 2023 at 20:14

answered Jan 24, 2023 at 3:18

Glyph

32.1k12 gold badges93 silver badges135 bronze badges

Sign up to request clarification or add additional context in comments.

3 Comments

MysteRys337 Over a year ago

Yeah, I imagined that doing something like this in python would be very difficult. But anyway, thanks for your response :)

MysteRys337 Over a year ago

Oh wow, you didn't need to go so far, but that was interesting! I'm going to take a further look into this. Once again, thank you for your response. I appreciate your effort in a more detailed explanation.

Glyph Over a year ago

Glad you found it useful; if you do manage to do this, please update / comment so posterity can see what the full solution is like :)

Collectives™ on Stack Overflow

Manipulate an application window frame using Python

1 Answer 1

3 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

3 Comments

Your Answer

Sign up or log in

Post as a guest

Related