Database speed optimization: few tables with many rows, or many tables with few rows?

Question

I have a big doubt.

Let's take as example a database for a whatever company's orders.

Let's say that this company make around 2000 orders per month, so, around 24K order per year, and they don't want to delete any orders, even if it's 5 years old (hey, this is an example, numbers don't mean anything).

In the meaning of have a good database query speed, its better have just one table, or will be faster having a table for every year?

My idea was to create a new table for the orders each year, calling such orders_2008, orders_2009, etc..

Can be a good idea to speed up db queries?

Usually the data that are used are those of the current year, so there are less lines the better is.. Obviously, this would give problems when I search in all the tables of the orders simultaneously, because should I will to run some complex UNION .. but this happens in the normal activities very rare.

I think is better to have an application that for 95% of the query is fast and the remaining somewhat slow, rather than an application that is always slow.

My actual database is on 130 tables, the new version of my application should have about 200-220 tables.. of which about 40% will be replicated annually.

Any suggestion?

EDIT: the RDBMS will be probably Postgresql, maybe (hope not) Mysql

S.Lott · Accepted Answer · 2009-05-14 15:11:59Z

13

Smaller tables are faster. Period.

If you have history that is rarely used, then getting the history into other tables will be faster.

This is what a data warehouse is about -- separate operational data from historical data.

You can run a periodic extract from operational and a load to historical. All the data is kept, it's just segregated.

answered May 14, 2009 at 15:11

S.Lott

393k83 gold badges521 silver badges791 bronze badges

Sign up to request clarification or add additional context in comments.

Comments

Andy Lester · Accepted Answer · 2009-05-14 15:24:01Z

7

Before you worry about query speed, consider the costs.

If you split the code into separate code, you will have to have code that handles it. Every bit of code you write has the chance to be wrong. You are asking for your code to be buggy at the expense of some unmeasured and imagined performance win.

Also consider the cost of machine time vs. programmer time.

answered May 14, 2009 at 15:24

Andy Lester

94.2k16 gold badges106 silver badges162 bronze badges

1 Comment

Strae Over a year ago

Agree but - my application handle many companyes - everyone have his own database - so, i prefer to spend much time to write the handler code, but have a beep-beep database after :)

Abhinav · Accepted Answer · 2009-05-14 15:14:47Z

3

If you use indexes properly, you probably need not split it into multiple tables. Most modern DBs will optimize access.

Another option you might consider is to have a table for the current year, and at the end append the data to another table which has data for all the previous years. ?

answered May 14, 2009 at 15:14

Abhinav

2,8663 gold badges22 silver badges22 bronze badges

1 Comment

Strae Over a year ago

Yes, to have just 2 tables (current year and history) was my first idea, but i thought that the year system wuold be faster - just need a better code-writing

Bravax · Accepted Answer · 2009-05-14 15:14:02Z

2

I would not split tables by year.

Instead I would archive data to a reporting database every year, and use that when needed.

Alternatively you could partition the data, amongst drives, thus maintaining performance, although i'm unsure if this is possible in postgresql.

answered May 14, 2009 at 15:14

Bravax

10.5k8 gold badges44 silver badges68 bronze badges

Comments

David Blewett · Accepted Answer · 2009-05-14 15:27:26Z

2

For the volume of data you're looking at splitting the data seems like a lot of trouble for little gain. Postgres can do partitioning, but the fine manual [1] says that as a rule of thumb you should probably only consider it for tables that exceed the physical memory of the server. In my experience, that's at least a million rows.

http://www.postgresql.org/docs/current/static/ddl-partitioning.html

answered May 14, 2009 at 15:27

David Blewett

1 Comment

Strae Over a year ago

In the real life, the order table have other 3 sub-tables, suited in a gerarchy scale. Every orders row, have at least 3 rows in the first 'child', avery child row have at least 4 rows in the child-of-the-child, and so on.. I also know that duplicate a structure like that will be harder - specially to mantain

Janco · Accepted Answer · 2009-05-14 15:21:27Z

0

I agree that smaller tables are faster. But it depends on your business logic if it makes sense to split a single entity over multiple tables. If you need a lot of code to manage all the tables than it might not be a good idea.

It also depends on the database what logic you're able to use to tackle this problem. In Oracle a table can be partitioned (on year for example). Data is stored physically in different table spaces which should make it faster to address (as I would assume that all data of a single year is stored together)

An index will speed things up but if the data is scattered across the disk than a load of block reads are required which can make it slow.

answered May 14, 2009 at 15:21

Janco

1,1401 gold badge8 silver badges15 bronze badges

1 Comment

Strae Over a year ago

I'll run on postgresql or.. mysql (probably postgresql)

David Fetter · Accepted Answer · 2009-05-14 15:30:17Z

0

Look into partitioning your tables in time slices. Partitioning is good for the log-like table case where no foreign keys point to the tables.

answered May 14, 2009 at 15:30

David Fetter

Collectives™ on Stack Overflow

Database speed optimization: few tables with many rows, or many tables with few rows?

7 Answers 7

Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

7 Answers 7

Comments

1 Comment

1 Comment

Comments

1 Comment

1 Comment

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related