15

I use PostgreSQL 9.1.2 and I have a basic table as below, where I have the Survival status of an entry as a boolean (Survival) and also in number of days (Survival(Days)).

I have manually added a new column named 1-yr Survival and now I want to fill in the values of this column for each entry in the table, conditioned on that entry's Survival and Survival (Days) column values. Once , completed the database table would look something like this:

Survival    Survival(Days)    1-yr Survival
----------  --------------    -------------
Dead            200                NO
Alive            -                 YES
Dead            1200               YES

The pseudo code to input the conditioned values of 1-yr Survival would be something like:

ALTER TABLE mytable ADD COLUMN "1-yr Survival" text
for each row
if ("Survival" = Dead & "Survival(Days)" < 365) then Update "1-yr Survival" = NO
else Update "1-yr Survival" = YES
end 

I believe this is a basic operation however I failed to find the postgresql syntax to execute it. Some search results return "adding a trigger", but I am not sure that is what I neeed. I think my situation here is a lot simpler. Any help/advice would be greatly appreciated.

3
  • Please be more precise. Your version of Postgres? Are you talking about a one-time operation or a continued effort? Is performance crucial? Any reason you want to store redundant data instead of using a view? Commented Aug 29, 2012 at 19:05
  • @ Erwin, Sorry I have added the version to the question now. I use PostgreSQL 9.1.2. It is a one time effort and the reason why I want to store redundant data is that I export the database in .csv format to use in R or Matlab and I want the 1-yr Survival information to be readily processed and available as an additional column before I run algorithms. I do not know about views though, will investigate that as well. Commented Aug 29, 2012 at 19:12
  • I see. You may be interested in the addition to my answer about COPY then. Commented Aug 29, 2012 at 19:18

2 Answers 2

13

The one-time operation can be achieved with a plain UPDATE:

UPDATE tbl
SET    one_year_survival = (survival OR survival_days >= 365);

I would advise not to use camel-case, white-space and parenthesis in your names. While allowed between double-quotes, it often leads to complications and confusion. Consider the chapter about identifiers and key words in the manual.

Are you aware that you can export the results of a query as CSV with COPY?
Example:

COPY (SELECT *, (survival OR survival_days >= 365) AS one_year_survival FROM tbl)
TO '/path/to/file.csv';

You wouldn't need the redundant column this way to begin with.


Additional answer to comment

To avoid empty updates:

UPDATE tbl
SET    "Dead after 1-yr" = (dead AND my_survival_col < 365)
      ,"Dead after 2-yrs" = (dead AND my_survival_col < 730)
....
WHERE  "Dead after 1-yr" IS DISTINCT FROM (dead AND my_survival_col < 365)
   OR  "Dead after 2-yrs" IS DISTINCT FROM (dead AND my_survival_col < 730)
...

Personally, I would only add such redundant columns if I had a compelling reason. Normally I wouldn't. If it's about performance: are you aware of indexes on expressions and partial indexes?

Sign up to request clarification or add additional context in comments.

5 Comments

Yes, the COPY option makes more sense I guess. Thanks a lot!
I just have another brief question. What if I have a non-boolean X_year_Survival (as opposed to the binary one_year_survival) in the scenario above and I want to explicity label "Dead after 1-yr", "Dead after 2-yrs" and "Dead after 3-yrs" as the column values conditioned on Survival_days column? We cannot use: "SET one_year_survival = (survival OR survival_days >= 365);" in this case. What is the syntax to explicitly label based on conditioning? Thanks a lot.
@Berkan: binary != boolean. I suppose you ask a new question. You can always refer to this one, to save some typing.
Thanks, what I am trying to ask is if I want the possible values of the derived 1_year_survival (column) to be explicitly stated and possibly more than two labels instead of just TRUE/FALSE, what would be the syntax that I use? I don't think I should ask a new question for this because the definition and title of the original question (of this page) does not restrict One-Year_Survival to be a boolean column. (However I understand that you have suggested a boolean solution since it made more sense given the context.)
@Berkan: I added a bit to my answer.
6

Honestly, I think you are better off not storing data in the db which is quickly and easily calculated from stored data. A better option would be to simulate a calculated field (gotchas noted below however). In this case you would 9changing spaces etc to underscores for easier maintenance:

CREATE FUNCTION one_yr_survival(mytable)
RETURNS BOOL
IMMUTABLE
LANGUAGE SQL AS $$
select $1.survival OR $1.survival_days >= 365;
$$;

then you can actually:

SELECT *, m.one_year_survival from mytable m;

and it will "just work." Note the following gotchas:

  • mytable.1_year_survival will not be returned by the default column list, and
  • you cannot omit the table identifier (m in the above example) because the parser converts this into one_year_survival(m).

However the benefit is that the value can be proven never to get out of sync with the other values. Otherwise you end up with a rats nest of check constraints.

You can actually take this approach quite far. See http://ledgersmbdev.blogspot.com/2012/08/postgresql-or-modelling-part-2-intro-to.html

1 Comment

This seems like a good idea. Especially if this column isn't in one's source data (like a scientific study with a fixed dataset). If one ever needs to recreate the database capabilities from source, a function seems like a healthy "reminder." Also, by saving the tablespace, the function (and hence logic) will always remain in the tablespace..

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.