0

I am building a Django web application and I'd like some advice on caching. I know very little about caching. I've read the caching chapter in the Django book, but am struggling to relate it to my real-world situation.

My application will be a web front-end on a Postgres database containing a largeish amount of data (150GB of server logs).

The database is read-only: the purpose of the application is to give users a simple way to query the data. For example, the user might ask for all rows from server X between dates A and B.

So my database needs to support very fast read operations, but it doesn't need to worry about write operations (much - I'll add new data once every few months, and it doesn't matter how long that takes).

It would be nice if clients making the same request could use a cache, rather than making another call to the Postgres database.

But I don't know what sort of cache I should be looking at: a web cache, or a database cache. Or even if Postgres is the best choice (I'd just like to use it because it works well with Django, and is so robust). Could anyone advise?

The Django book says memcached is the best cache with Django, but it runs in memory, and the results of some of these queries could be several GB, so memcached might fill up the machine's memory quickly. But perhaps I don't fully understand how memcached operates.

3
  • 1
    My advice is to write the simplest program you could and leave any performance problem for the DBA (I mean, a real DBA not a developer posing as DBA) - a fine tuned database is better than a cache layer in front of the DBMS. Commented Feb 25, 2015 at 20:02
  • HTTP caching could "work" (e.g. each response could be cached for an extended period of time, possibility indefinitely if the data is static). I guess it depends on your definition of "work"; what criteria are you using to compare your options? Commented Feb 26, 2015 at 1:15
  • @PauloScardine sadly I don't have access to a DBA! Writing the simplest program possible is good advice, though :) Commented Feb 26, 2015 at 13:19

2 Answers 2

2

Your query should in no way return several GB of data. There's no practical reason to do so, as the user cannot absorb that much data at a time. Your result set should be paged, such that the user sees only 10, 25, whatever results at a time. That then allows you to also limit your query to only fetch 10, 25, whatever records at a time starting from a particular index based on the page number.

Caching search result pages is not a particularly good idea, regardless, though. For one, the odds that different users will ever conduct exactly the same search are pretty minimal, and you'll end up wasting RAM to cache result sets that will never be used again. Also, something like logs should be real-time. If you return a cached result set, there might be new, relevant results that are not included, obscuring the usefulness of your search.

Sign up to request clarification or add additional context in comments.

Comments

0

As mentioned above you have limitations on what problems caching can solve. As you are building this application, then I see no reason why you couldn't just plug in Django Haystack and Whoosh and see how it performs, then switching to some of the other more Enterprise search backends is a breeze.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.