0

I know how to read PostgreSQL tables in remote server with psycopg2, sqlalchemy, dask but I am not satisfied with processing time to read the tables and started researching faster alternatives and I found asyncpg as 7x more faster than all but documentation for asyncpg is very poor compared to above referred libraries which are plenty of examples over there.

My question is: how to read PostgreSQL tables efficiently?

I have tried as below:

import asyncio
import asyncpg
import pandas as pd

from sshtunnel import SSHTunnelForwarder #Allow connection with SSH like PuttY connection
from sshtunnel import SSHTunnelForwarder, create_logger #Allow to follow the processes running

SSHTunnelForwarder(('IP_detail', Port_number),
        ssh_private_key=r'path_to_the_ssh_key_in_my_computer',

        ssh_username="username",
        #ssh_password="password", 
        remote_bind_address=('localhost', port_number),
        local_bind_address=('localhost', port_number),
        logger=create_logger(loglevel=1) #Makes processes being ran displayed
                           )

conn = await asyncpg.connect(user='username', password='password',
                                 database='database_name', host='127.0.0.1', port='port')


values = await conn.fetch('''SELECT * FROM table_name''')

values=pd.DataFrame(values)
values

With above code I get the PostgreSQL table all rows values for every columns but doesn't show column names and it shows columns numbering instead of their proper names. How to correct this?

4 Answers 4

1

Use dict(values) to see key-value pair of record and payload

Sign up to request clarification or add additional context in comments.

1 Comment

Sorry, I did not understand your tip. Could you be more detailed?
0

First, extract your column names:

columns = [c.name for c in values.get_attributes()]

Then, create your dataframe:

values = pd.DataFrame(values, columns=columns)

See https://github.com/MagicStack/asyncpg/issues/173#issuecomment-538055841

Comments

0

The link in hellycopterinjuneer's answer is correct, but the answer does not indicate that is necessary to create a prepared statement. I report here the full code from the link for convenience.

async def fetch_as_dataframe(conn: asyncpg.Connection, query: str, *args):
    stmt = await conn.prepare(query)
    columns = [a.name for a in stmt.get_attributes()]
    data = await stmt.fetch(*args)
    return pd.DataFrame(data, columns=columns)

Comments

0

It works for me in my FastApi application without get_attributes():

values = await app.state.db.fetch("SELECT * FROM ... ")
df = DataFrame([dict(row) for row in data])

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.