4

I have a big doubt.

Let's take as example a database for a whatever company's orders.

Let's say that this company make around 2000 orders per month, so, around 24K order per year, and they don't want to delete any orders, even if it's 5 years old (hey, this is an example, numbers don't mean anything).

In the meaning of have a good database query speed, its better have just one table, or will be faster having a table for every year?

My idea was to create a new table for the orders each year, calling such orders_2008, orders_2009, etc..

Can be a good idea to speed up db queries?

Usually the data that are used are those of the current year, so there are less lines the better is.. Obviously, this would give problems when I search in all the tables of the orders simultaneously, because should I will to run some complex UNION .. but this happens in the normal activities very rare.

I think is better to have an application that for 95% of the query is fast and the remaining somewhat slow, rather than an application that is always slow.

My actual database is on 130 tables, the new version of my application should have about 200-220 tables.. of which about 40% will be replicated annually.

Any suggestion?

EDIT: the RDBMS will be probably Postgresql, maybe (hope not) Mysql

7 Answers 7

13

Smaller tables are faster. Period.

If you have history that is rarely used, then getting the history into other tables will be faster.

This is what a data warehouse is about -- separate operational data from historical data.

You can run a periodic extract from operational and a load to historical. All the data is kept, it's just segregated.

Sign up to request clarification or add additional context in comments.

Comments

7

Before you worry about query speed, consider the costs.

If you split the code into separate code, you will have to have code that handles it. Every bit of code you write has the chance to be wrong. You are asking for your code to be buggy at the expense of some unmeasured and imagined performance win.

Also consider the cost of machine time vs. programmer time.

1 Comment

Agree but - my application handle many companyes - everyone have his own database - so, i prefer to spend much time to write the handler code, but have a beep-beep database after :)
3

If you use indexes properly, you probably need not split it into multiple tables. Most modern DBs will optimize access.

Another option you might consider is to have a table for the current year, and at the end append the data to another table which has data for all the previous years. ?

1 Comment

Yes, to have just 2 tables (current year and history) was my first idea, but i thought that the year system wuold be faster - just need a better code-writing
2

I would not split tables by year.

Instead I would archive data to a reporting database every year, and use that when needed.

Alternatively you could partition the data, amongst drives, thus maintaining performance, although i'm unsure if this is possible in postgresql.

Comments

2

For the volume of data you're looking at splitting the data seems like a lot of trouble for little gain. Postgres can do partitioning, but the fine manual [1] says that as a rule of thumb you should probably only consider it for tables that exceed the physical memory of the server. In my experience, that's at least a million rows.

  1. http://www.postgresql.org/docs/current/static/ddl-partitioning.html

1 Comment

In the real life, the order table have other 3 sub-tables, suited in a gerarchy scale. Every orders row, have at least 3 rows in the first 'child', avery child row have at least 4 rows in the child-of-the-child, and so on.. I also know that duplicate a structure like that will be harder - specially to mantain
0

I agree that smaller tables are faster. But it depends on your business logic if it makes sense to split a single entity over multiple tables. If you need a lot of code to manage all the tables than it might not be a good idea.

It also depends on the database what logic you're able to use to tackle this problem. In Oracle a table can be partitioned (on year for example). Data is stored physically in different table spaces which should make it faster to address (as I would assume that all data of a single year is stored together)

An index will speed things up but if the data is scattered across the disk than a load of block reads are required which can make it slow.

1 Comment

I'll run on postgresql or.. mysql (probably postgresql)
0

Look into partitioning your tables in time slices. Partitioning is good for the log-like table case where no foreign keys point to the tables.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.