I have coded a C++ binary using WinHTTP running on Windows 11 to send a prompt to Open AI and get a response from it. WinHTTP has the following stages:
- Create a session with
WinHTTPOpen(). - Create a connection with
WinHttpConnect(). Server isapi.openai.com. - Create a request with
WinHttpOpenRequest(). I usePOSTfor/v1/chat/completions. - Send the request to the server with
WinHttpSendRequest(). - Receive a response from the server with
WinHttpReceiveResponse(). - Read the data with
WinHttpQueryDataAvailable()andWinHttpReadData().
Step 4 is taking 6.8 seconds on an average for simple prompts that I send. Simple means something like "How far is Earth from the Sun?", "How many planets are there in the Solar System?", etc. Rest of the steps take 1 to 30ms. But the 6800 ms for Step 4 stood out. I tried to pre-warm the connection by going through the whole sequence (steps 1 to 6) ahead of time with a dummy prompt ("Say hello to me"), retaining the session and connection and only crafting the request (Step 3 onwards) later when I test with variable prompts. Sometimes Step 4 takes 3500 ms, but mostly it takes 6800 ms. During the investigation I found it is using HTTP/1.1, not HTTP/2.
Is there a client side configuration to reduce the latency?
libcurlfrom Linux C++ code, I see 2.3 to 2.4 seconds latency for similar queries. Two major differences:1. From Windows I send string in UTF-8. 2. The endpoint from Linux is "/responses". From Windows it is "/v1/chat/completions". Checking impact of those changes.libcurlfrom Windows (not WSL) to find the latency is 1.8 to 1.9 seconds. Unsure what the problem is withWinHTTP.