2

I'm trying to figure out how to fill in values that are missing from one column with the non-missing values from other rows that have the same value on a given column. For instance, in the below example, I'd want all the "1" values to be equal to Bob and all of the "2" values to be equal to John

ID #   | Name
-------|-----
1      | Bob 
1      | (null)
1      | (null)
2      | John
2      | (null)
2      | (null)
`

EDIT: One caveat is that I'm using postgresql 8.4 with Greenplum and so correlated subqueries are not supported.

3
  • Please name your "particular implementation". What version do you use? Also, can there be cases of IDs with more than one distinct names? How to handle that? Chose alphabetically first? Commented Jul 21, 2012 at 18:16
  • @dchandler: Postgres 8.4 does support correlated subqueries (did it for ages actually). Greenplum must be based on a really old version then. Commented Jul 21, 2012 at 21:19
  • @a_horse_with_no_name, that's interesting to know. It looks like it's Greenplum then! what a shame Commented Jul 22, 2012 at 21:58

3 Answers 3

3
CREATE TABLE bobjohn
        ( ID INTEGER NOT NULL
        , zname varchar
        );
INSERT INTO bobjohn(id, zname) VALUES
 (1,'Bob') ,(1, NULL) ,(1, NULL)
,(2,'John') ,(2, NULL) ,(2, NULL)
        ;

UPDATE bobjohn dst
SET zname = src.zname
FROM bobjohn src
WHERE dst.id = src.id
AND dst.zname IS NULL
AND src.zname IS NOT NULL
        ;

SELECT * FROM bobjohn;

NOTE: this query will fail if more than one name exists for a given Id. (and it won't touch records for which no non-null name exists)

If you are on a postgres version >-9, you could use a CTE to fetch the source tuples (this is equivalent to a subquery, but is easier to write and read (IMHO). The CTE also tackles the duplicate values-problem (in a rather crude way):

        --
        -- CTE's dont work in update queries for Postgres version below 9
        --
WITH uniq AS (
        SELECT DISTINCT id
        -- if there are more than one names for a given Id: pick the lowest
        , min(zname) as zname
        FROM bobjohn
        WHERE zname IS NOT NULL
        GROUP BY id
        )
UPDATE bobjohn dst
SET zname = src.zname
FROM uniq src
WHERE dst.id = src.id
AND dst.zname IS NULL
        ;

SELECT * FROM bobjohn;
Sign up to request clarification or add additional context in comments.

Comments

1
UPDATE tbl
SET    name = x.name
FROM  (
    SELECT DISTINCT ON (id) id, name
    FROM   tbl
    WHERE  name IS NOT NULL
    ORDER  BY id, name
    ) x
WHERE  x.id = tbl.id
AND    tbl.name IS NULL;

DISTINCT ON does the job alone. Not need for additional aggregation.

In case of multiple values for name, the alphabetically first one (according to the current locale) is picked - that's what the ORDER BY id, name is for. If name is unambiguous you can omit that line.

Also, if there is at least one non-null value per id, you can omit WHERE name IS NOT NULL.

Comments

0

If you know for a fact that there are no conflicting values (multiple rows with the same ID but different, non-null names) then something like this will update the table appropriately:

UPDATE some_table AS t1
SET name = (
    SELECT name
    FROM some_table AS t2
    WHERE t1.id = t2.id
      AND name IS NOT NULL
    LIMIT 1
)
WHERE name IS NULL;

If you only want to query the table and have this information filled in on the fly, you can use a similar query:

SELECT
    t1.id,
    (
        SELECT name
        FROM some_table AS t2
        WHERE t1.id = t2.id
          AND name IS NOT NULL
        LIMIT 1
    ) AS name

FROM some_table AS t1;

1 Comment

I'm on a particular implementation that doesn't support correlated subqueries. Is there any other way to handle it?

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.