0

I have a table that will have about 3 * 10 ^ 12 lines (3 trillion), but with only 3 attributes.

In this table you will have the IDs of 2 individuals and the similarity between them (it is a number between 0 and 1 that I multiplied by 100 and put as a smallint to decrease the space).

It turns out that I need to perform, for a certain individual that I want to do the research, the summarization of these columns and returning how many individuals have up to 10% similarity, 20%, 30%. These values ​​are fixed (every 10) until identical individuals (100%).

However, as you may know, the query will be very slow, so I thought about:

  • Create a new table to save summarized values
  • Create a VIEW to save these values.

As individuals are about 1.7 million, the search would not be so time consuming (if indexed, returns quite fast). So, what can I do?

I would like to point out that my population will be almost fixed (after the DB is fully populated, it is expected that almost no increase will be made).

1 Answer 1

1

A view won't help, but a materialized view sounds like it would fit the bill, if you can afford a sequential scan of the large table whenever the materialized view gets updated.

It should probably contain a row per user with a count for each percentile range.

Alternatively, you could store the aggregated data in an independent table that is updated by a trigger on the large table whenever something changes there.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.