As for how it's done internally, ctypes.windll.kernel32 is WinDLL("kernel32"). WinDLL inherits CDLL, whose __init__ opens a handle to kernel32.dll using self._handle = ctypes._dlopen (from _ctypes import LoadLibrary as _dlopen), and LoadLibrary is bound to the load_library C function in the runtime, which calls LoadLibraryExW, and then returns a call to PyLong_FromVoidPtr, which creates a PyObject (PyLong) from a C void pointer (HMODULE) and returns it, and just like a regular function implemented in python, it now internally has a PyObject that it uses to assign to self._handke). The CDLL __getattr__ is called by the runtime the first time "GetModuleHandle" is used on kernel32, and this python code instantiates an object of _FuncPtr type for it, which inherits _CFuncPtr, which is a _ctypes.CFuncPtr (from _ctypes import _CFuncPtr as CFuncPtr).
The type identifier _ctypes.CFuncPtr is bound to a C object (PyCFuncPtr_Type of type PyTypeObject) in the definition of said C object in the runtime library; the identifier string is a member of the object. Remember that under the hood the runtime creates PyObjects for all python objects (including python functions), and a PyTypeObject for all python types, this is usually done implicitly by the runtime but in this case the python code explicitly creates a function object that can be used to make the native call. It now uses the constructor PyCFuncPtr_new set in the type object (PyCFuncPtr_Type) in C to instantiate the actual function object from the function type object, which takes the args, in this case the kernel32 WinDLL object and "GetModuleHandle", and calls PyCFuncPtr_FromDll (because the very fact there are arguments means it's not a regular function) to create a new function object and associate it with the actual function address using GetProcAddress on the handle to the module in WinDLL set up in the CDLL __init__.
A call performed on this function object will cause python to call the address in this object, which will be GetProcAddress instead of an address of a function that interprets the bytecode at the bytecode address... I assume.
Next time the __getattr__ is not called because the attribute now exists.