PostgreSQL: Foreign key in new tables from existing table

Question

I'm very new to Postgresql and Python and I'm having some issues understanding foreign keys (which I think is what I use here). I've had a look at an example from here but I don't think that is exactly what I need

As a simple example, I have some information in an existing table:

    [ID      REFERENCE     REF_AGE   DATA1      DATA 2]
    [1       JOHN          50        50         60    ]
    [2       JOHN          50        55         30    ]
    [3       TOM           60        60         10    ]
    [4       MATT          30        76         57    ]
    [5       MATT          30        45         47    ]

I want to make two new tables from this. One including the ID and data with a reference_id that links to the other new table - a reference table, where I can store the other information about each reference (e.g. the age above).

Table 1:

    [ID      REF_ID        DATA1      DATA 2]
    [1       1             50         60    ]
    [2       1             55         30    ]
    [3       2             60         10    ]
    [4       3             76         57    ]
    [5       3             45         47    ]

Table 2:

    [REF_ID     NAME    AGE  ]
    [1          JOHN    50   ]
    [2          TOM     60   ]
    [3          MATT    30   ]

Can anyone show me how to split existing data like this? Separate the unique values from the original tables reference column into the new reference table, and insert corresponding ref_id into the other new table?

Clodoaldo Neto · Accepted Answer · 2016-09-07 09:53:48Z

2

There is the recipe. But you have a problem if the person names are not unique.

drop table if exists not_normalized cascade;
create table not_normalized (
    id int, reference text, ref_age int, data1 int, data2 int
);

insert into not_normalized (id, reference, ref_age, data1, data2) values
(1,'JOHN',50,50, 60    ),
(2,'JOHN',50,55, 30    ),
(3,'TOM',60,60, 10    ),
(4,'MATT',30,76, 57    ),
(5,'MATT',30,45, 47    ),
(6,null,null,42,50);

drop table if exists referenced cascade;
create table referenced (
    ref_id serial primary key,
    name text,
    age int
);

Selecting the distinct pair (name, age) minimizes the name collision problem:

insert into referenced (name, age)
select distinct reference, ref_age
from not_normalized
where (reference, ref_age) is not null
;
table referenced;
 reference | ref_age 
-----------+---------
 JOHN      |      50
 TOM       |      60
 MATT      |      30

drop table if exists referencer;
create table referencer (
    id serial primary key,
    ref_id int references referenced (ref_id),
    data1 int, data2 int
);

Again use the age to minimize collisions:

insert into referencer (ref_id, data1, data2)
select r.ref_id, data1, data2
from
    not_normalized nn
    left join
    referenced r on r.name = nn.reference and r.age = nn.ref_age
;
table referencer;
 id | ref_id | data1 | data2 
----+--------+-------+-------
  1 |      1 |    50 |    60
  2 |      1 |    55 |    30
  3 |      3 |    76 |    57
  4 |      3 |    45 |    47
  5 |      2 |    60 |    10
  6 |        |    42 |    50

edited Sep 7, 2016 at 9:53

answered Sep 3, 2016 at 12:46

Clodoaldo Neto

127k30 gold badges251 silver badges274 bronze badges

Sign up to request clarification or add additional context in comments.

4 Comments

MattGeo Over a year ago

Thank you this was exactly what I needed! I've since come into another issue however. Say we add another row to not_normalized, which contains data1 and data2 but with no reference (or ref_age), and these columns are now NULL. How do you deal with this? referencer output seems to ignore this row entirely, which i assume has to do with trying to join with NULL.

Clodoaldo Neto Over a year ago

@user6789594 Change the inner join to an [outer] left join. Updated answer.

MattGeo Over a year ago

Sorry to be difficult, I didn't really want to create another question. I've had an instance come up where name will be filled e.g. 'Peter' but age will be NULL. Is there a way to deal with this case and give it a ref_id still? Just have NULL for the age in the referenced table? It seems the (reference, ref_age) is not null always returns NULL if either of the columns are null.

MattGeo Over a year ago

And I can't do a simple WHERE reference is not null OR ref_age is not null; as Im actually calling these columns (theres much more than this) as a %s. e.g. INSERT INTO %s (%s) SELECT DISTINCT %s FROM %s WHERE (%s);',_tbl_import,column_set,column_set,_tbl_export,column_set

Collectives™ on Stack Overflow

PostgreSQL: Foreign key in new tables from existing table

1 Answer 1

4 Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

4 Comments

Your Answer

Sign up or log in

Post as a guest

Related