1

My table structure is as follows:

CREATE TABLE IF NOT EXISTS commodity_data (
dataid bigint(20) unsigned NOT NULL AUTO_INCREMENT,
commodity smallint(6) NOT NULL,
market smallint(6) NOT NULL,
quantity float NOT NULL,
price_min mediumint(9) NOT NULL,
price_max mediumint(9) NOT NULL,
price_modal mediumint(9) NOT NULL,
date date NOT NULL,
modified timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMP,
PRIMARY KEY (dataid)
) ENGINE=InnoDB DEFAULT CHARSET=latin1 AUTO_INCREMENT=7059415 ;

My SELECTs on this table will have WHERE clauses with one or more of 'commodity', 'market' and 'date' searched on.

My ORDER BYs will be by price_min, price_max or price_modal and sometimes most of the other fields.

The table will end up being over 10 million rows and will keep expanding by about 5 to 10 thousand a day.

My server is currently a VPS dual 2.4Ghz xeon, 4GB RAM.

The only index is currently on the 'dataid' field.

I have read that setting up indexes can help and I think these should be on commodity, market and date, but I wanted to check if this is right before going ahead unless there's a better way of doing this. The table size will be around 600MB and growing.

The 'commodity' and 'market' fields refer to the ID of commodities and markets in other tables. I will either LEFT JOIN or if it's faster, I will read those tables into arrays in PHP (simple one-level associative arrays id => name). There are around 300 commodities and 2,000 markets.

Currently SELECTs are taking too long, and for example COUNT queries with a WHERE clause will take a minute or more.

2 Answers 2

2

If you run your selection query with the EXPLAIN before the text of the query, MySQL will display information from the optimizer about the query execution plan, and the suggested indexes that will speed up your query ...

Sign up to request clarification or add additional context in comments.

1 Comment

When I run: SELECT * FROM commodity_data WHERE commodity IN ( 4, 8, 9 ) AND DATE BETWEEN '2010-01-01' AND '2010-12-31' ORDER BY commodity ASC , market ASC , DATE ASC LIMIT 0 , 30 It takes 25 seconds. The EXPLAIN gives the following: id: 1 select_type: SIMPLE table: commodity_data type: ALL possible_keys: NULL key: NULL key_len: NULL ref: NULL rows: 7119981 Extra: Using where; Using filesort
2

Try to figure out which compound indexes you need: if you are searching for commodity AND market AND date, you should have 1 (one) index on all three. Ordering matters, if for instance you sometimes don't include market, the order would probably go INDEX(commodity,date,market) (unused last). If the WHERE varies wildly, mutiple compound indexes for the cases may help (e.g. INDEX(commodity,date,market) but also INDEX(market,date,commodity). Keep in mind they are a performance hit when writing/updating.

Still, a minute is quite long: be sure your database can load the table into memory by setting innodb_buffer_pool_size as high is it can go. After that, run the mentioned EXPLAINS on queries that still take a long time and take it from there.

6 Comments

How much of a performance hit are we talking about? I will be entering 5-10k rows per day in one block. Ultimately, I have no issues with INSERTs taking longer (there will be no UPDATEs) as long as they don't take many multiples longer.
I'm told that increasing innodb_buffer_pool_size can potentially cause InnoDB corruption in the event of a server crash and have been advised against increasing it. It is currently set to 64M on my server.
5-10k per day is next to nothing, so you should be OK there. Innodb_buffer_pool_size does not cause corruption. Setting inappropriate values for innodb_flush_log_at_trx_commit and then yanking the power of your server might cause it. But by all means, disregard Peter Zaltsev's advice in the link given. It's not like he knows his MySQL stuff or anything [/sarcasm].
I am in no position to disregard any advice. I was advised by an administrator at my server provider that there are risks (as you stated) if power is suddenly lost. I will look into creating several multiple-column indexes - this seems the most direct change to make with regards to speeding up my SELECTs.
And did this person believe that setting innodb_buffer_pool_size lower reduced that risk? It does not lower the risk as long as the flush log setting is still there, you still can potentially lose data. Scaling the amount of innodb_buffer_pool_size down because of this is madness, unless you have the very weird situation of a UPS capable of correctly shutting down the system on a power failure but with only juice less then a few seconds....
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.