0

I have a table with about 20 columns. 10 of them generally tend to have 1 of 15 or so possible values (different for each column). In addition, one column has a largish string. The table currently has over 3 million rows, and growing. It is about 1GB big (just data)

  1. I would like to split off the large text column, since it is not used often and would probably dramatically reduce the table size, thus improving performance.
  2. I would like to "normalize" all those repeating columns into a separate table (each), so that I can get the list of current values without running 10 distinct queries on 3M rows. It takes a long time.

#2 would be one-to-many relations. #1 could be 1-to-1 or 1-to-many. I don't care.

The question is: Can these be done by pure SQL statements? How? Or do I need to write a program to put the data in the new table and get the PK and insert it in the right column, one row at a time?

Edit
Here is a sample of what I am trying to do:

ID  Field1  Lookup  Text
10  val1    look1   some very long text
11  val2    look2   more very long text
12  val2    look1   NULL
13  val4    look1   some very long text
.
.
.

To this:

ID  Field1  Lookup  Text
10  val1    1       1
11  val2    2       2
12  val2    1       0
13  val4    1       4 (1?)
.
.
.

1 Answer 1

1

For the point 1, it's a one-to-one relationship, so you don't need a foreign key (instead, you'll use the same primary key for both table).

  1. Create your table mytable_text with a PK of the same type (not in auto increment of course), your TEXT column, and, if you're using InnoDB (which is recommend in this case) a foreign key to your first (main) table. You could add ON DELETE CASCADE to the foreign key to ease the maintenance.
  2. INSERT INTO mytable_text (id, large_text_column) (SELECT id, large_text_column FROM mytable)
  3. ALTER TABLE mytable DROP large_text_column

For the point 2, it's a bit longer, but it's also doable in SQL (I'm taking the example of your field "Lookup"). It could be something like that:

  1. Create your secondary table "lookups" with an autoincrement ID, and a field "title" for example (with a UNIQUE INDEX if you feel like it).
  2. INSERT INTO lookups (title) (SELECT DISTINCT Lookup FROM mytable)
  3. SET foreign_key_checks = 0
  4. ALTER TABLE mytable ADD lookup_id INTEGER UNSIGNED [...], ADD FOREIGN KEY [...]
  5. UPDATE mytable SET lookup_id = (SELECT id FROM lookups WHERE lookup.title = mytable.Lookup
  6. ALTER TABLE mytable DROP Lookup
  7. (optionally) ALTER TABLE mytable CHANGE lookup_id Lookup [...] (if you want to keep the same name)
  8. SET foreign_key_checks = 1
Sign up to request clarification or add additional context in comments.

1 Comment

I edited my answer based on your example. Note that with 3M rows the alter statements can take some time.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.