0

I have coded a C++ binary using WinHTTP running on Windows 11 to send a prompt to Open AI and get a response from it. WinHTTP has the following stages:

  1. Create a session with WinHTTPOpen().
  2. Create a connection with WinHttpConnect(). Server is api.openai.com.
  3. Create a request with WinHttpOpenRequest(). I use POST for /v1/chat/completions.
  4. Send the request to the server with WinHttpSendRequest().
  5. Receive a response from the server with WinHttpReceiveResponse().
  6. Read the data with WinHttpQueryDataAvailable() and WinHttpReadData().

Step 4 is taking 6.8 seconds on an average for simple prompts that I send. Simple means something like "How far is Earth from the Sun?", "How many planets are there in the Solar System?", etc. Rest of the steps take 1 to 30ms. But the 6800 ms for Step 4 stood out. I tried to pre-warm the connection by going through the whole sequence (steps 1 to 6) ahead of time with a dummy prompt ("Say hello to me"), retaining the session and connection and only crafting the request (Step 3 onwards) later when I test with variable prompts. Sometimes Step 4 takes 3500 ms, but mostly it takes 6800 ms. During the investigation I found it is using HTTP/1.1, not HTTP/2.

Is there a client side configuration to reduce the latency?

2
  • Using libcurl from Linux C++ code, I see 2.3 to 2.4 seconds latency for similar queries. Two major differences:1. From Windows I send string in UTF-8. 2. The endpoint from Linux is "/responses". From Windows it is "/v1/chat/completions". Checking impact of those changes. Commented Aug 14 at 23:33
  • I used libcurl from Windows (not WSL) to find the latency is 1.8 to 1.9 seconds. Unsure what the problem is with WinHTTP. Commented Aug 18 at 14:48

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.