2

I'm trying to create a pandas data frame using the Snowflake Packages in python.

I run some query

sf_cur = get_sf_connector()
sf_cur.execute("USE WAREHOUSE Warehouse;")
sf_cur.execute("""select Query"""
)

print('done')

The output is roughly 21k rows. Then using

df = pd.DataFrame(sf_cur.fetchall())

takes forever, even on a limit sample of only 100 rows. Is there a way to optimize this, ideally the bigger query would be run in a loop so handling even bigger data sets would be ideal.

2 Answers 2

3

as fetchall() copies all the result in memory, you should try to iterate over the cursor object directly and map it to a data frame inside the for block

cursor.execute(query)
    for row in cursor:
    #build the data frame

Other example, just to show:

query = "Select ID from Users"
cursor.execute(query)
for row in cursor:
    list_ids.append(row["ID"])
Sign up to request clarification or add additional context in comments.

2 Comments

Can you explain "build the dataframe" I'm not sure how to do this other than Fetch()?
@JeffreyForbes for example, let's say you have a list of User's ID's (just for showing the case): query = "Select ID from Users" cursor.execute(query) for row in cursor: list_ids.append(row["ID"])
0

Use df = cur.fetch_pandas_all() to build pandas dataframe on top of results.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.