1

my SQL(with sub-queries) take so long(nearly 24hour). Is using sub-queries is not good for performance?

My table as below

mysql> show create table eventnew;
CREATE TABLE `eventnew` (
  `id` int(50) NOT NULL AUTO_INCREMENT,
  `date` datetime DEFAULT NULL,
  `src_ip` int(10) unsigned DEFAULT NULL,
  `src_port` int(10) unsigned DEFAULT NULL,
  `dst_ip` int(10) unsigned DEFAULT NULL,
  `dst_port` int(10) unsigned DEFAULT NULL,
  `repo_ip` varchar(50) DEFAULT NULL,
  `link` varchar(50) DEFAULT NULL,
  `binary_hash` varchar(50) DEFAULT NULL,
  `sensor_id` varchar(50) DEFAULT NULL,
  `repox_ip` int(10) unsigned DEFAULT NULL,
  `flags` varchar(50) DEFAULT NULL,
  `shellcode` varchar(1000) DEFAULT NULL,
  PRIMARY KEY (`id`),
  KEY `date` (`date`),
  KEY `sensor_id` (`sensor_id`),
  KEY `src_ip` (`src_ip`)
) ENGINE=MyISAM AUTO_INCREMENT=883278 DEFAULT CHARSET=latin1

my SQL as below:

SELECT COUNT( DISTINCT binary_hash ) AS cnt
FROM eventnew
WHERE DATE >=  '2010-10-16'
AND DATE <  '2010-10-17'
AND binary_hash NOT 
IN (

SELECT DISTINCT binary_hash
FROM eventnew
WHERE DATE <  '2010-10-16'
AND binary_hash IS NOT NULL
)

below are result running EXPLAIN:

+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
| id | select_type        | table    | type  | possible_keys | key  | key_len | ref  | rows   | Extra       |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+
|  1 | PRIMARY            | eventnew | range | date          | date | 9       | NULL |  14296 | Using where |
|  2 | DEPENDENT SUBQUERY | eventnew | range | date          | date | 9       | NULL | 384974 | Using where |
+----+--------------------+----------+-------+---------------+------+---------+------+--------+-------------+

3 Answers 3

2

Using subqueries certainly does affect your performance. For instance, lets say a Table T1 has 'n' records and T2 has 'm' records. when you do a join on T1 and T2, it will take n*m records and then will sort them based on your condition. The same case goes with in keyword as well. and if you have another constraint in subquery, it would further decrease the efficiency. However, using subqueries couldn't be avoided in practice as they are meant to be.

Sign up to request clarification or add additional context in comments.

Comments

0

I'd suggest you use NOT EXISTS instead of NOT IN.

Comments

0

Try this

SELECT COUNT( DISTINCT a.binary_hash ) AS cnt
FROM eventnew a left join eventnew b on (a.binary_hash=b.binary_hash AND b.binary_hash IS NOT NULL AND b.DATE <  '2010-10-16')
WHERE a.DATE >=  '2010-10-16'
AND a.DATE <  '2010-10-17'
and  b.date is null

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.