Understanding database schema - normalized forms

Question

I have this current setup:

product

product_id | product_name | category_id

category

category_id | category_name

vendor

vendor_id | vendor_name | vendor_status

vendor_price

vendor_id | product_id | vendor_price

As I understand it, according to the "rules" of normalization there should be 2 more tables declaring the relationship like this:

rel_product_vendor_price

product_id | vendor_price_id

rel_vendor_price_vendor

vendor_price_id | vendor_id

Then the above table called vendor_price would have product_id removed and added a vendor_price_id.

I fail to see the point in creating yet two more tables to keep things together as it will complicate queries. Especially the INSERTS are complicated and must be performed in transactions.

Currently the tables holds more than 300.000 products where each has several different vendors with different prices to each making it count as more than 1.5 million documents in Sphinx.

Am I wrong in my design, or would there be any advantage in changing it to a more normalized design?

UPDATE

I have a table more to hold all the product categories. I have updated the schema above, forgot that in the initial post.

Generally I split the queries based on category and I query each category for all the belonging products. When a user clicks a product I query all the prices for that particular product and display the prices in descending order.

Because a vendor can be suspended (vendor.vendor_status) all queries must be performed with several joins leading back to the vendor table.

In the inserts I delete everything in product from a particular vendor, all vendor prices from the same vendor gets deleted as well due to foreign key constraint. Then I insert a new into product and vendor_price.

Hope this makes sense.

UPDATE 2

Having run a lot of query testing this night, I have discovered that keeping the vendor_status in the vendor table REALLY slows things down a LOT.

Because the database has to join selects between vendor_price and vendor each time it is selecting a price, which has a great importance in getting for example:

MIN(vendor_price) AS min_vendor_price, MAX(vendor_price) AS max_vendor_price)

Keeping a duplicate of vendor_status in each vendor_price row would mean a LOT of redundant data, but it really speeds things up in selects.

From

Query took 7.8040 sec

To

Query took 3.1640 sec

When data sets get this large I guess it's a matter of balancing between optimizing queries and using a LOT of cache features. Normalization really gets in the way when it comes to speed even on todays hardware.

ryanbwork · Accepted Answer · 2012-07-18 22:20:20Z

1

Normalization attempts to eliminate redundant data so inserts/updates/deletes don't have to work on more than one table at a time; on the contrary redundant data can speed up queries by eliminating the need for lots of joins, but then you have to deal with inserting/updating/deleting in multiple places. Your 3 table schema looks fine to me, assuming you just want to lookup prices based on vendor ids and product ids, but please give more background on the type of queries you hope to run / what other kinds of data you're planning on storing.

answered Jul 18, 2012 at 22:20

ryanbwork

2,15312 silver badges12 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Understanding database schema - normalized forms

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related