In postgresql, how can I fill in missing values within a column?

Question

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John

ID #   | Name
-------|-----
1      | Bob 
1      | (null)
1      | (null)
2      | John
2      | (null)
2      | (null)
`

EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.

Please name your "particular implementation". What version do you use? Also, can there be cases of IDs with more than one distinct names? How to handle that? Chose alphabetically first? — Erwin Brandstetter
– Erwin Brandstetter, Commented Jul 21, 2012 at 18:16
@dchandler: Postgres 8.4 does support correlated subqueries (did it for ages actually). Greenplum must be based on a really old version then. — user330315
– user330315, Commented Jul 21, 2012 at 21:19
@a_horse_with_no_name, that's interesting to know. It looks like it's Greenplum then! what a shame — d_a_c321
– d_a_c321, Commented Jul 22, 2012 at 21:58

wildplasser · Accepted Answer · 2012-07-21 12:03:31Z

CREATE TABLE bobjohn
        ( ID INTEGER NOT NULL
        , zname varchar
        );
INSERT INTO bobjohn(id, zname) VALUES
 (1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
        ;

UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
        ;

SELECT * FROM bobjohn;

NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)

If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):

        --
        -- CTE's dont work in update queries for Postgres version below 9
        --
WITH uniq AS (
        SELECT DISTINCT id
        -- if there are more than one names for a given Id: pick the lowest
        , min(zname) as zname
        FROM bobjohn
        WHERE zname IS NOT NULL
        GROUP BY id
        )
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
        ;

SELECT * FROM bobjohn;

Erwin Brandstetter · Accepted Answer · 2012-07-21 18:26:14Z

1

UPDATE tbl
SET    name = x.name
FROM  (
    SELECT DISTINCT ON (id) id, name
    FROM   tbl
    WHERE  name IS NOT NULL
    ORDER  BY id, name
    ) x
WHERE  x.id = tbl.id
AND    tbl.name IS NULL;

DISTINCT ON does the job alone. Not need for additional aggregation.

In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.

Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.

answered Jul 21, 2012 at 18:26

Erwin Brandstetter

669k160 gold badges1.2k silver badges1.3k bronze badges

Comments

cdhowie · Accepted Answer · 2012-07-21 04:07:47Z

0

If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:

UPDATE some_table AS t1
SET name = (
    SELECT name
    FROM some_table AS t2
    WHERE t1.id = t2.id
      AND name IS NOT NULL
    LIMIT 1
)
WHERE name IS NULL;

If you only want to query the table and have this information filled in on the fly, you can use a similar query:

SELECT
    t1.id,
    (
        SELECT name
        FROM some_table AS t2
        WHERE t1.id = t2.id
          AND name IS NOT NULL
        LIMIT 1
    ) AS name

FROM some_table AS t1;

answered Jul 21, 2012 at 4:07

cdhowie

172k25 gold badges303 silver badges324 bronze badges

1 Comment

d_a_c321 Over a year ago

I'm on a particular implementation that doesn't support correlated subqueries. Is there any other way to handle it?

Collectives™ on Stack Overflow

In postgresql, how can I fill in missing values within a column?

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

Comments

Comments

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related