Pandas read_sql columns not working when using index_col - returns all columns instead

Question

I'm using pandas.read_sql() command to get data from my postgresql database. The SQL query is created generically with many columns from which I only want to get specific columns using one column as index. Creating an example table test_table like this:

column1 column2 column3
1       2       3
2       4       6
3       6       9

I tried to use the index_col and columns parameter from pandas.read_sql() to get column1 as index and column2 as data (and neglecting column3!). But it always returns the whole table. Also when writing columns=['column1', 'column2'] nothing changes...

I'm using python 2.7.6 with pandas 0.17.1 - Thanks for help!

Example Code:

import pandas
import psycopg2
import sqlalchemy


def connect():
    connString = (
        "dbname=test_db "
        "host=localhost "
        "port=5432 "
        "user=postgres "
        "password=password"
    )
    return psycopg2.connect(connString)

engine = sqlalchemy.create_engine(
            'postgresql://',
            creator=connect)
sql = (
    'SELECT '
    'column1, '
    'column2, '
    'column3 '
    'FROM test_table'
)
data = pandas.read_sql(
    sql,
    engine,
    index_col=['column1'],
    columns=['column2'])
print(data)

why don't you want to change your 'select' query? and i guess you want to use pandas.read_sql_query() instead — MaxU - stand with Ukraine
– MaxU - stand with Ukraine, Commented Mar 11, 2016 at 10:30
The sql query should only be build once and used afterwards by different functions, picking specific columns from it. I did not use read_sql_query(), because it has no columns parameter (which is not really doing what i want for now) - for my code read_sql() and read_sql_query() do not differ... — Henhuy
– Henhuy, Commented Mar 11, 2016 at 11:56

miriamsimone · Accepted Answer · 2017-06-26 06:12:21Z

8

I think the argument columns did not work for you because you were using sql statement instead of providing it with your table name.

As mentioned from pandas website:

columns : list, default: None List of column names to select from sql table (only used when reading a table).

Therefore, I think if you try:

pandas.read_sql('test_table', engine, index_col=['column1'], columns=['column2'])

columns argument will actually work.

edited Jun 26, 2017 at 6:12

miriamsimone

36.7k12 gold badges97 silver badges121 bronze badges

answered Jun 26, 2017 at 4:36

mdls

961 silver badge4 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

Georgy Over a year ago

It's a pity that it doesn't work with sql statements

Collectives™ on Stack Overflow

Pandas read_sql columns not working when using index_col - returns all columns instead

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related