7

I am writing a simple tool to check duplicate files(i.e. files having same data). The mechanism is to generate hashes for each file using sha-512 algorithm and then store these hashes in MYSQL database. I store hashes in binary(64) unique not null column. Each row will have a unique binary hash and used to check file is duplicate or not.

-- My questions are --

  1. Can I use indexes on binary column, my default table collation is latin1 - default collation?

  2. Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.

  3. What other things should I take care of to get best performance?

1 Answer 1

18
  1. Can I use indexes on binary column, my default table collation is latin1 - default collation?

    Yes, you can; collation is only relevant for character datatypes, not binary datatypes (it defines how characters should be ordered)—also, be aware that latin1 is a character encoding, not a collation.

  2. Which Indexing mechanism should I use Btree or Hash, for getting high performance? I need to update or add 100 of rows per seconds.

    Note that hash indexes are only available with the MEMORY and NDB storage engines, so you may not even have a choice.

    In any event, either would typically be able to meet your performance criteria—although for this particular application I see no benefit from using B-Tree (which is ordered), whereas Hash would give better performance. Therefore, if you have the choice, you may as well use Hash.

    See Comparison of B-Tree and Hash Indexes for more information.

  3. What other things should I take care of to get best performance?

    Depends on your definition of "best performance" and your environment. In general, remember Knuth's maxim "premature optimisation is the root of all evil": that is, only optimise when you know that there will be a problem with the simplest approach.

Sign up to request clarification or add additional context in comments.

1 Comment

I am using Innodb storage engine for hash store table, so HEAP indexing mechanism will not be available for it. I think, Btree indexing will not be bad.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.