78

There is a column of type text in a table in Postgres 9.1. I'd like to know the impact of just that column on the disk space needed. It doesn't need to be precise, but I'd like to get an idea if that column is responsible for 20%/30%/... of the disk space consumed by the database.

I know pg_relation_size, but it only operates at table level.

I have many databases with this same schema. I dumped a smaller one and cut out the column with grep and cut and compared the size of the plain text dumps. But this is not necessarily a good indicator of space requirements in the live db, and it's also more difficult to do that for large databases.

3 Answers 3

102
select
    sum(pg_column_size(the_text_column)) as total_size,
    avg(pg_column_size(the_text_column)) as average_size,
    sum(pg_column_size(the_text_column)) * 100.0 / pg_total_relation_size('t') as percentage
from t;

See official documentation of pg_total_relation_size at:
https://www.postgresql.org/docs/current/functions-admin.html#FUNCTIONS-ADMIN-DBOBJECT

Interesting related question:
What's the difference between pg_table_size, pg_relation_size & pg_total_relation_size? (PostgreSQL)

Sign up to request clarification or add additional context in comments.

3 Comments

pg_relation_size returns the size of the base table without including the toast table. When someone cares about disk usage of each column, it is likely that a toast table is involved, so you probably want to use pg_table_size instead.
You can also use pg_total_relation_size if you want to count the toast table plus the size of the indexes.
This seems to do a full Seq Scan on the table so be careful when the table is very large.
56

Slight improvement on the accepted answer: pretty print the size and use pg_total_relation_size to be more accurate.

select
    pg_size_pretty(sum(pg_column_size(column_name))) as total_size,
    pg_size_pretty(avg(pg_column_size(column_name))) as average_size,
    sum(pg_column_size(column_name)) * 100.0 / pg_total_relation_size('table_name') as percentage
from table_name;

Comments

21

If you want a report for all the columns in a database sorted by size then here is the way

BEGIN;
CREATE FUNCTION tc_column_size(table_name text, column_name text)
    RETURNS BIGINT AS
$$
    declare response BIGINT;
BEGIN
    EXECUTE 'select sum(pg_column_size(t."' || column_name || '")) from ' || table_name || ' t ' into response;
    return response;
END;
$$
    LANGUAGE plpgsql;

SELECT
    z.table_name,
    z.column_name,
    pg_size_pretty(z.size)
FROM (
    SELECT
        table_name,
        column_name,
        tc_column_size(table_name, column_name) size
    FROM
        information_schema.columns
    WHERE
        table_schema = 'public') AS z
WHERE
    size IS NOT NULL
    -- and z.table_name = 'my_table' -- <--- uncomment to filter a table
ORDER BY
    z.size DESC;

ROLLBACK; -- <--- You may not want to keep that function

2 Comments

Should also quote the table_name on line 7: EXECUTE 'select sum(pg_column_size(t."' || column_name || '")) from "' || table_name || '" t ' into response;
It is much faster to filter the table in the inner query (i.e. move the comment to the schema's filter)

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.