I'm currently working on a project involving keeping track of users and their actions with my database (PostgreSQL as the RDMS), and I have run into an issue when trying to perform COUNT(*) on occurrences of each user. What I want is to be able to, efficiently, count the number of times each user appears from every record, and also be able to achieve looking at counts on a particular date range.
So, the problem is how do we achieve counting the total number of times a user appears from the tables contents, and how do we count the total number on a date range.
What I've tried
As you might know, Postgres doesn't support COUNT(*) very well using indices, so we have to consider other ways to reduce the # of records it looks at in order to speed up the query. So my first approach is to create a table to keep track of the number of times a user has a log message associated with them, and on what day (similar to the idea behind a materialized view, but I dont want continually refresh the materialized view with my count query). Here is what I've come up with:
CREATE TABLE users_counts(user varchar(65536), counter int default 0, day date);
CREATE RULE inc_user_date_count
AS ON INSERT TO main_table
DO ALSO UPDATE users_counts SET counter = counter + 1
WHERE user = NEW.user AND day = DATE(NEW.date_);
What this does is every time a new record is inserted into my 'main_table', we update the current users_counts table to increment the records whose date is equal to the new records date, and the user names are the same.
NOTE: the date_ column in 'main_table' is a timestamp so I must cast the new records date_ to be a DATE type.
The problem is, what if the user column value doesn't already exist in my new table 'users_count' for the current day, then nothing is updated.
Here is my question:
How do I write the rule such that we check if a user exists for the current day, if so increment that counter, otherwise insert new row with user, day, and counter of 1;
I also would like to know if my approach makes sense to do, or is there any ideas I am missing that I just haven't thought about. As my database grows, it is increasingly inefficient to perform counting, so I want to avoid any performance bottlenecks.
EDIT 1: I was able to actually figure this out by creating a separate RULE but I'm not sure if this is correct:
CREATE RULE test_insert AS ON INSERT TO main_table
DO ALSO INSERT INTO users_counts(user, counter, day)
SELECT NEW.user, 1, DATE(NEW.date)
WHERE NOT EXISTS (SELECT user FROM users.log_messages WHERE user = NEW.user_);
Basically, an insert happens if the user doesn't already exist in my CACHED table called user_counts, and the first rule above updates the count.
What I'm unsure of is how do I know when which rule is called first, the update rule or insert.. And there must be a better way, how do I combine the two rules? Can this be done with a function?