2

I have a question for you. I have this database with 250.000 recordings, with 2 text fields each containing up to 300 words. And I want do select all the data that meets some criteria and put it in another table. I thought to delete those recording that are not satisfying my condition with this wuery:

DELETE FROM `cables` WHERE 
`data` NOT LIKE "%BRV%" AND
`data` NOT LIKE "%Venezuela%" AND
`data` NOT LIKE "%Caracas%" AND
`data` NOT LIKE "%Hugo Chavez%" AND
`tags` NOT LIKE "%BRV%" AND
`tags` NOT LIKE "%Venezuela%" AND
`tags` NOT LIKE "%Caracas%" AND
`tags` NOT LIKE "%Hugo Chavez%" AND
`header` NOT LIKE "%BRV%" AND
`header` NOT LIKE "%Venezuela%" AND
`header` NOT LIKE "%Caracas%" AND
`header` NOT LIKE "%Hugo Chavez%" AND
`subject` NOT LIKE "%BRV%" AND
`subject` NOT LIKE "%Venezuela%" AND
`subject` NOT LIKE "%Caracas%" AND
`subject` NOT LIKE "%Hugo Chavez%" AND
`tmp` NOT LIKE "%BRV%" AND
`tmp` NOT LIKE "%Venezuela%" AND
`tmp` NOT LIKE "%Caracas%" AND
`tmp` NOT LIKE "%Hugo Chavez%" AND
`identifier` NOT LIKE "%BRV%" AND
`identifier` NOT LIKE "%Venezuela%" AND
`identifier` NOT LIKE "%Caracas%" AND
`identifier` NOT LIKE "%Hugo Chavez%"

Each row is OK if it containt at least one time any of those words. The thing is that I already have 3 hours since it is being in execution, and nothing hapened. I've stopped the proccess and nothing happened. The final resuls should have somewhere around 14000 recordings, What can I do? Thank you!!!!

2
  • Devide and conquer, prioterize your selection and execute one by one in different function Commented Sep 4, 2011 at 20:37
  • Do a little analysis upfront as well. You might find that you can trim your query because Venezuela only appears in the header or subject or only appears concurrently in fields. Just a thought. Commented Sep 4, 2011 at 20:43

4 Answers 4

2

Nothing happened because you stopped it. So, the commit; has not been done.

You should split your query to have a shorter execution.

The NOT LIKE %% are very very expensive!!! (the index, if there are some are not used...)

Sign up to request clarification or add additional context in comments.

Comments

2

try REGEXP, might perform better here.

DELETE FROM `cables` WHERE
  `data` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez' AND
  `tags` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez' AND
  `header` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez' AND
  `subject` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez' AND
  `tmp` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez' AND
  `identifier` NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez'

also try it with concatenation (Karolis' suggestion)

DELETE FROM `cables` WHERE
   CONCAT( `data`, `tags`, `header`, `subject`, `tmp`, `identifier` )
   NOT REGEXP 'BRV|Venezuela|Caracas|Hugo Chavez'

1 Comment

Agreed. It would be also a good idea to concatinate all the strings into one, then regular expression should be pretty fast.
1

The problem is that LIKE %text% will not make use of a fulltext index. So a big table with 250.000 entries and a lot of LIKE %% criterias will take an awful long time.

Are you sure you need the leading '%...'? Otherwise you can try to use Boolean search modifier.

Comments

0

Why not reverse your condition to select the records that match instead of deleting the ones that don't? And then insert those into the other table? This might very well be faster, as 3 hours, even for the overuse of not-like conditions, is way too long for 250'000 rows.

INSER INTO `selected_cables`
SELECT * FROM `cables`
WHERE NOT ( 
`data` NOT LIKE "%BRV%" AND
`data` NOT LIKE "%Venezuela%" AND
`data` NOT LIKE "%Caracas%" AND
`data` NOT LIKE "%Hugo Chavez%" AND
`tags` NOT LIKE "%BRV%" AND
`tags` NOT LIKE "%Venezuela%" AND
`tags` NOT LIKE "%Caracas%" AND
`tags` NOT LIKE "%Hugo Chavez%" AND
`header` NOT LIKE "%BRV%" AND
`header` NOT LIKE "%Venezuela%" AND
`header` NOT LIKE "%Caracas%" AND
`header` NOT LIKE "%Hugo Chavez%" AND
`subject` NOT LIKE "%BRV%" AND
`subject` NOT LIKE "%Venezuela%" AND
`subject` NOT LIKE "%Caracas%" AND
`subject` NOT LIKE "%Hugo Chavez%" AND
`tmp` NOT LIKE "%BRV%" AND
`tmp` NOT LIKE "%Venezuela%" AND
`tmp` NOT LIKE "%Caracas%" AND
`tmp` NOT LIKE "%Hugo Chavez%" AND
`identifier` NOT LIKE "%BRV%" AND
`identifier` NOT LIKE "%Venezuela%" AND
`identifier` NOT LIKE "%Caracas%" AND
`identifier` NOT LIKE "%Hugo Chavez%")

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.