I have a table that will have about 3 * 10 ^ 12 lines (3 trillion), but with only 3 attributes.
In this table you will have the IDs of 2 individuals and the similarity between them (it is a number between 0 and 1 that I multiplied by 100 and put as a smallint to decrease the space).
It turns out that I need to perform, for a certain individual that I want to do the research, the summarization of these columns and returning how many individuals have up to 10% similarity, 20%, 30%. These values are fixed (every 10) until identical individuals (100%).
However, as you may know, the query will be very slow, so I thought about:
- Create a new table to save summarized values
- Create a VIEW to save these values.
As individuals are about 1.7 million, the search would not be so time consuming (if indexed, returns quite fast). So, what can I do?
I would like to point out that my population will be almost fixed (after the DB is fully populated, it is expected that almost no increase will be made).