I'm manipulating a long, very queried table (> 500 millions entries), so it's very important to avoid large queries.
Currently I need to get some values with a condition (will explain better in a moment) and then, check if that values are in another group of values (all of this refering to the same field). I'm creating a view of the table, using a with.
So, here is the table syntax: (table employee)
+--------+-------------+-----------+--------+---------+-----------+
| period | employee_id | operation | sub_op | payment | work_zone |
+--------+-------------+-----------+--------+---------+-----------+
Periods have this format 'YYMM', one period refers to one month.
Of course the table is much, much longer than this sample, but I need only those fields in the query. A brief explanation of what I need, and then the query itself.
I need to get all employee_id in the current period, with a considerable payment (at least $250) and a specific operation (first I group that operations with sub_op value). The operation value asked is 97, and in the query you will see how I group it.
Now, to that values, I group them by work_zone and the grouped operation values. And now the subqueries start... I need:
- All that values that were not in past period.
- All that values that were not in last 36 periods (3 years).
- All that values that were in at least one of last 36 periods.
- All that values that were in at least one of last 36 periods, but with a diferent operation.
- All that values that were in at least one of last 36 periods, but with a payment lower than $250.
So, here is the query I've got so far. (I'm using as period '1109')
CREATE OR REPLACE VIEW hired_fired AS
WITH query_hired_fired AS (
SELECT work_zone, operation, sub_op, employee_id,
CASE
WHEN operation = 97 THEN
CASE
WHEN sub_op IN (1,3,5) THEN 'Cookers'
WHEN sub_op IN (2,6) THEN 'Waitress'
WHEN sub_op IN (4,7,8,9,10) THEN 'Cashier'
WHEN sub_op = 11 THEN 'Security'
WHEN sub_op IN (12,13) THEN 'Cleaners'
ELSE 'Others'
END
END AS opgroup
FROM employee
WHERE period = 1109 AND payment >= 250 AND operation = 97
)
SELECT 201109 AS periodo, opgroup, work_zone
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period = 1108 AND payment >= 250 AND operation = 97)) AS total,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id NOT IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS absolut,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation = 97)) AS reincorporated,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment >= 250 AND operation != 97)) AS operation_change,
(SELECT COUNT(DISTINCT employee_id) FROM query_hired_fired WHERE employee_id IN (SELECT employee_id FROM employee WHERE period BETWEEN 0808 AND 1108 AND payment < 250 AND operation = 97)) AS raised,
FROM query_hired_fired
GROUP BY work_zone, opgroup
So, my question is... Is there anyway I can do this query without all the subqueries? I think this would take several hours to run, and that is not a posibility working with this table.
Sorry if I have been unclear with something, I will answers all comentaries and doubts as soon as possible. Thanks.