Return to Answer

added 2 characters in body

Source Link

edited Jun 17, 2023 at 4:19

J_H

Whenever you ask the API about codecode, you should definitely record the tuple (code, price, write_time, write_time), showing that at write_time it traded at price.

Whenever you ask the API about code, you should definitely record the tuple (code, price, write_time, write_time), showing that at write_time it traded at price.

Whenever you ask the API about code, you should definitely record the tuple (code, price, write_time, write_time), showing that at write_time it traded at price.

Source Link

answered Jun 17, 2023 at 3:20

J_H

Here's what I heard you say:

We need to know "recent", but not up-to-the-second, prices of commodities like this one.
Displaying "stale" prices carries some cost.
Each query to refresh a price incurs some small API cost.
Commodity prices do move, but they move "slowly".

You can cache prices in an RDBMS or in Redis, whichever you're more comfortable with, as both can offer sensible indexing.

The big things you need to do are

quantify the cost of (2.) stale price quotes, and
choose a refresh strategy.

We can evaluate (4.), slowness of price movement, against historic quotes per commodity, or averaged across all commodities. And (3.), the API cost, comes straight from your API contract.

So now we need to write down the cost of (2.) staleness, of delivering a (t1, price1) quote at time t2 when the API would reveal a code was actually trading at price2. Historic price movement will suggest a relationship between Δt and Δprice. Knowing t2 - t1 lets you estimate whether Δprice likely is "large".

Balancing that against the known API costs suggests a refresh strategy.

Whenever you ask the API about code, you should definitely record the tuple (code, price, write_time, write_time), showing that at write_time it traded at price.

When displaying the price of a commodity, update that last element with read_time, so we can track "popular" commodities that are frequently read by many users and should be frequently updated.

We might additionally tack on an estimate of "error" or "uncertainty" after the "read time". This has the advantage of being indexable, for efficient queries of the most badly out-of-date commodities. Which brings us to....

refresh strategy

You can query the API

from a background daemon, and/or
from the process displaying results to the user

Imagine your vendor's API is "slow" at delivering price results, too slow for an interactive web user to wait on. Then you would choose to have a daemon issue all queries, letting each user process consult the local cache and hope for the best.

If the vendor's API is "fast", then each user process might choose to send it queries, cache results, and display results. That brings us to another choice.

You can

display only cached results to the user, or
issue queries, await results, then finally display them to user.

The first leads to predictable UX response times. If you prefer the second, you still have the option of making a kafka pub-sub request to a daemon then awaiting response, or doing the API query in the context of the user process.

background daemon

The daemon is responsible for minimizing the (2.) staleness cost function of anticipated display requests, subject to (3.) constraints on API spending.

Suppose that all commodities are equally volatile at all times, and that all users are equally important. Then we pick some "acceptable" staleness, perhaps a threshold of ten minutes, and the daemon simply asks the local RDBMS / redis cache for commodities that are going stale. It cycles through them, issuing API requests and updating the cache. If it has trouble keeping up we can run more than one instance of the daemon.

Now suppose that some commodities are more popular than others, perhaps following a Zipfian distribution. Use the read_time mentioned above to prioritize commodities. If ALUP11 is popular, we will cache price1 at time t1, and then display that same price1 to users at times t2, t3, t4.... Your design challenge is to figure out by which time t5 should the daemon have already consulted the API and cached an updated price.