1

I need a table to store some ratings, in this table I have a composite index (user_id, post_id) and other column to identify different rating system.

user_id - bigint
post_id - bigint
type - varchar
...

Composite Index (user_id, post_id)

In this table I've not a primary key because the primary need to be unique and the INDEX not need to be unique, in my case univocity is a problem.

For example I can have

INSERT INTO tbl_rate
    (user_id,post_id,type)
VALUES
    (24,1234,'like'),
    (24,1234,'love'),
    (24,1234,'other');

The missing of PRIMARY KEY may cause performance problem? My table structure is good or I need to change it?

Thank you

3
  • 1
    The combination of all 3 fields is your PK Commented Jul 27, 2019 at 14:19
  • Interesting... thank you Commented Jul 27, 2019 at 14:21
  • One cannot really judge an index without knowing what the main queries will be. Commented Nov 11, 2019 at 2:56

2 Answers 2

4

A few points:

It sounds like you are just using what is currently unique about the table and making that as a primary key. That works. And natural keys have some advantages when it comes to querying because of locality. (The data for each user is stored in the same area). And because the table is clustered by that key which eliminates lookups to the data if you are searching by the columns in the primary.

  1. But, using a natural primary key like you chose has disadvantages for performance as well.

  2. Using a very large primary key will make all other indexes very large in innodb because the primary key is included in each index value.

  3. Using a natural primary key isn't as fast as a surrogate key for INSERT's because in addition to being bigger it can't just insert at the end of the table each time. It has to insert in the section for that user and post etc.

  4. Also, if u are searching by time most likely you will be seeking all over the table with a natural key unless time is your first column. surrogate keys tend to be local for time and can often be just right for some queries.

  5. Using a natural key like yours as a primary key can also be annoying. What if you want to refer to a particular vote? You need a few fields. Also it's a little difficult to use with lots of ORMs.

Here's the Answer

I would create your own surrogate key and use it as a primary key rather than rely on innodb's internal primary key because you'll be able to use it for updates and lookups.

ALTER TABLE tbl_rate 
ADD id INT UNSIGNED NOT NULL AUTO_INCREMENT, 
ADD PRIMARY KEY(id);

But, if you do create a surrogate primary key, I'd also make your key a UNIQUE. Same cost but it enforces correctness.

ALTER TABLE tbl_rate 
ADD UNIQUE ( user_id, post_id, type );
Sign up to request clarification or add additional context in comments.

2 Comments

"But, if you do create a surrogate primary key, I'd also make your key a UNIQUE. Same cost but it enforces correctness. " -> ALTER TABLE tbl_rate ADD KEY ( user_id, post_id, type ); does not make a unique key it's just a index..
@GidonWise - Item 2 applies only if 2 or more secondary indexes.
3

The missing of PRIMARY KEY may cause performance problem?

Yes in InnoDB for sure, as InnoDB will use a algorithm to create it's own "ROWID", Which is defined in dict0boot.ic

Returns a new row id.
@return the new id */
UNIV_INLINE
row_id_t
dict_sys_get_new_row_id(void)
/*=========================*/
{
    row_id_t    id;

    mutex_enter(&(dict_sys->mutex)); 

    id = dict_sys->row_id;

    if (0 == (id % DICT_HDR_ROW_ID_WRITE_MARGIN)) {

        dict_hdr_flush_row_id();
    }

    dict_sys->row_id++;

    mutex_exit(&(dict_sys->mutex));

    return(id);
}

The main problem in that code is mutex_enter(&(dict_sys->mutex)); which blocks others threads from accessing if one thread is already running this code. Meaning it will table lock the same as MyISAM would.

% may take a few nanoseconds. That is insignificant compared to everything else. Anyway #define DICT_HDR_ROW_ID_WRITE_MARGIN 256

Indeed yes Rick James this is indeed insignificant compared to what was mentioned above. The C/C++ compiler would micro optimize it more to to get even more performance out off it by making the CPU instructions lighter.
Still the main performance concern is mentioned above..

Also the modulo operator (%) is a CPU heavy instruction.
But depening on the C/C++ compiler (and/or configuration options) if might be optimized if DICT_HDR_ROW_ID_WRITE_MARGIN is a power of two.
Like (0 == (id & (DICT_HDR_ROW_ID_WRITE_MARGIN - 1))) as bitmasking is much faster, i believe DICT_HDR_ROW_ID_WRITE_MARGIN indeed had a number which is a power of 2

6 Comments

% may take a few nanoseconds. That is insignificant compared to everything else. Anyway #define DICT_HDR_ROW_ID_WRITE_MARGIN 256
"% may take a few nanoseconds. That is insignificant compared to everything else" @RickJames True more or less still the C/C++ compiler optimisation would optimize it (much) better still that thread locking on the mutex was the main point of performance concern.. Thanks for confirming that DICT_HDR_ROW_ID_WRITE_MARGIN has a number which is a power off 2 i did remember that correctly then.. i updated the question do be more clear that it is indeed is a micro optimisation vs the rest when compared ..
As I understand the code (thanks to JCole for an explanation somewhere), all tables without a PK share a 6-byte number maintained by this subroutine. 256 values are allocated at a time, probably dict_hdr_flush_row_id() is a costly function. Any leftover values are lost on a shutdown. 2^48 is big enough so that "no one" will every run out of ids.
But the bottom line is... You ought to provide an explicit PK.
ok i was always under the impression or somehow rememberd incorrectly that this would generate on a table basis. as i read the source code behide this a long time ago (around the time when InnoDB came to MySQL) . Guess i am wrong about that then might check it myself when i have more time at hand ,, @RickJames
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.