I am having only basic knowledge of SQL and this is my first for the tag SQL, so pardon me if I miss something in the first instance. There is a large database having, one table which is of particular interest, consists of more than 100 Million rows. Client is using PostgreSQL for database.
The structure of this table is something like this (sample data) table name say table
biometric_token ID status State_code
1 ABCD 1 A 1
2 ABCD 2 R 2
3 ABCD 3 A 2
4 BCDE 4 A 1
5 DEFG 5 A 2
6 DEFG 6 A 2
7 DEFG 7 A 3
8 FGHI 8 A 4
- column
biometric_tokenis biometric based id which I can safely assume is really unique per person and he/she cannot fake it. Needless to say, this token is collected in end of the registration process to which other three columns pertain. IDis system generated unique ID given to every individual when he/she approach for registration.statusis eitherapproved - Aorrejected - R.State_codeis code of state/county in which individual is registered.
What I need is to find how many (count only) individuals (i.e. having same biometric_token) have been registered more than once. So the output format be something like
count_of_individuals number_of_times
1 1 3
2 1 2
Explanation
ABCDsuccessfully registered 2 timesDEFGsuccessfully registered 3 times
Code (nested query) that I have tried, that's working but very slow, is given below. I need to find if there is any faster alternative which can successfully retrieve this summary from this huge database
SELECT COUNT( DISTINCT biometric_token) AS count_of_individuals, cnt AS number_of_times
FROM
(SELECT biometric_token, count (ID) AS cnt
FROM table
WHERE status IN ('A')
GROUP BY biometric_token
HAVING cnt > 1)
GROUP BY number_of_times