0

I'm very new to Postgresql and Python and I'm having some issues understanding foreign keys (which I think is what I use here). I've had a look at an example from here but I don't think that is exactly what I need

As a simple example, I have some information in an existing table:

    [ID      REFERENCE     REF_AGE   DATA1      DATA 2]
    [1       JOHN          50        50         60    ]
    [2       JOHN          50        55         30    ]
    [3       TOM           60        60         10    ]
    [4       MATT          30        76         57    ]
    [5       MATT          30        45         47    ]

I want to make two new tables from this. One including the ID and data with a reference_id that links to the other new table - a reference table, where I can store the other information about each reference (e.g. the age above).

Table 1:

    [ID      REF_ID        DATA1      DATA 2]
    [1       1             50         60    ]
    [2       1             55         30    ]
    [3       2             60         10    ]
    [4       3             76         57    ]
    [5       3             45         47    ]

Table 2:

    [REF_ID     NAME    AGE  ]
    [1          JOHN    50   ]
    [2          TOM     60   ]
    [3          MATT    30   ]

Can anyone show me how to split existing data like this? Separate the unique values from the original tables reference column into the new reference table, and insert corresponding ref_id into the other new table?

1 Answer 1

2

There is the recipe. But you have a problem if the person names are not unique.

drop table if exists not_normalized cascade;
create table not_normalized (
    id int, reference text, ref_age int, data1 int, data2 int
);

insert into not_normalized (id, reference, ref_age, data1, data2) values
(1,'JOHN',50,50, 60    ),
(2,'JOHN',50,55, 30    ),
(3,'TOM',60,60, 10    ),
(4,'MATT',30,76, 57    ),
(5,'MATT',30,45, 47    ),
(6,null,null,42,50);

drop table if exists referenced cascade;
create table referenced (
    ref_id serial primary key,
    name text,
    age int
);

Selecting the distinct pair (name, age) minimizes the name collision problem:

insert into referenced (name, age)
select distinct reference, ref_age
from not_normalized
where (reference, ref_age) is not null
;
table referenced;
 reference | ref_age 
-----------+---------
 JOHN      |      50
 TOM       |      60
 MATT      |      30

drop table if exists referencer;
create table referencer (
    id serial primary key,
    ref_id int references referenced (ref_id),
    data1 int, data2 int
);

Again use the age to minimize collisions:

insert into referencer (ref_id, data1, data2)
select r.ref_id, data1, data2
from
    not_normalized nn
    left join
    referenced r on r.name = nn.reference and r.age = nn.ref_age
;
table referencer;
 id | ref_id | data1 | data2 
----+--------+-------+-------
  1 |      1 |    50 |    60
  2 |      1 |    55 |    30
  3 |      3 |    76 |    57
  4 |      3 |    45 |    47
  5 |      2 |    60 |    10
  6 |        |    42 |    50
Sign up to request clarification or add additional context in comments.

4 Comments

Thank you this was exactly what I needed! I've since come into another issue however. Say we add another row to not_normalized, which contains data1 and data2 but with no reference (or ref_age), and these columns are now NULL. How do you deal with this? referencer output seems to ignore this row entirely, which i assume has to do with trying to join with NULL.
@user6789594 Change the inner join to an [outer] left join. Updated answer.
Sorry to be difficult, I didn't really want to create another question. I've had an instance come up where name will be filled e.g. 'Peter' but age will be NULL. Is there a way to deal with this case and give it a ref_id still? Just have NULL for the age in the referenced table? It seems the (reference, ref_age) is not null always returns NULL if either of the columns are null.
And I can't do a simple WHERE reference is not null OR ref_age is not null; as Im actually calling these columns (theres much more than this) as a %s. e.g. INSERT INTO %s (%s) SELECT DISTINCT %s FROM %s WHERE (%s);',_tbl_import,column_set,column_set,_tbl_export,column_set

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.