Writing data into snowflake using Python

Question

Can we write data directly into snowflake table without using Snowflake internal stage using Python????

It seems auxiliary task to write in stage first and then transform it and then load it into table. can it done in one step only just like JDBC connection in RDBMS.

Mike Walton · Accepted Answer · 2020-08-27 12:56:33Z

18

The absolute fastest way to load data into Snowflake is from a file on either internal or external stage. Period. All connectors have the ability to insert the data with standard insert commands, but this will not perform as well. That said, many of the Snowflake drivers are now transparently using PUT/COPY commands to load large data to Snowflake via internal stage. If this is what you are after, then you can leverage the pandas write_pandas command to load data from a pandas dataframe to Snowflake in a single command. Behind the scenes, it will execute the PUT and COPY INTO for you.

https://docs.snowflake.com/en/user-guide/python-connector-api.html#label-python-connector-api-write-pandas

I highly recommend this pattern over INSERT commands in any driver. And I would also recommend transforms be done AFTER loading to Snowflake, not before.

answered Aug 27, 2020 at 12:56

Mike Walton

7,4592 gold badges14 silver badges26 bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

howMuchCheeseIsTooMuchCheese Over a year ago

also, check out the parallel and chunk_size params for write_pandas. Tuning these can have a huge impact on total time to load.

chetan_surwade · Accepted Answer · 2021-12-30 07:41:47Z

2

If someone is having issues with large datasets. Try Using dask instead and generate your dataframe partitioned into chunks. Then you can use dask.delayed with sqlalchemy. Here we are using snowflake's native connector method i.e. pd_writer, which under the hood uses write_pandas and eventually using PUT COPY with compressed parquet file. At the end it comes down to your I/O bandwidth to be honest. The more throughput you have, the faster it gets loaded in Snowflake Table. But this snippet provides decent amount of parallelism overall.

import functools
from dask.diagnostics import ProgressBar
from snowflake.connector.pandas_tools import pd_writer
import dask.dataframe as dd
df = dd.read_csv(csv_file_path, blocksize='64MB')
ddf_delayed = df.to_sql(
        table_name.lower(),
        uri=str(engine.url),
        schema=schema_name,
        if_exists=if_exists,
        index=False,
        method=functools.partial(
        pd_writer,quote_identifiers=False),
        compute=False,
        parallel=True
    )
with ProgressBar():
    dask.compute(ddf_delayed, scheduler='threads', retries=3)

edited Dec 30, 2021 at 7:41

answered Mar 5, 2021 at 15:29

chetan_surwade

827 bronze badges

2 Comments

HimanshuSPaul Over a year ago

Thanks for replying ....But again it is using stage only to load data

chetan_surwade Over a year ago

Yes, but that's what snowflake recommends as it is the fastest way to put your data into a snowflake table. But one could certainly use the sqlalchemy along with dask to_sql method just like pandas.

codersl · Accepted Answer · 2023-09-19 03:55:50Z

Java:

Load Driver Class:

Class.forName("net.snowflake.client.jdbc.SnowflakeDriver")

Maven:

Add following code block as a dependency

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>snowflake-jdbc</artifactId>
  <version>{version}</version>
</dependency>

Spring:

application.yml:

spring:
  datasource
    hikari:
      maximumPoolSize: 4 # Specify maximum pool size
      minimumIdle: 1 # Specify minimum pool size
      driver-class-name: com.snowflake.client.jdbc.SnowflakeDriver

Python:

import pyodbc

# pyodbc connection string
conn = pyodbc.connect("Driver={SnowflakeDSIIDriver}; Server=XXX.us-east-2.snowflakecomputing.com; Database=VAQUARKHAN_DB; schema=public; UID=username; PWD=password")

#  Cursor
cus=conn.cursor()

# Execute SQL statement to get current datetime and store result in cursor
cus.execute("select current_date;")

# Display the content of cursor
row = cus.fetchone()

print(row)

How to insert json response data in snowflake database more efficiently?

Apache Spark:

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>spark-snowflake_2.11</artifactId>
  <version>2.5.9-spark_2.4</version>
</dependency>

Code

import org.apache.spark.sql.DataFrame

// Use secrets DBUtil to get Snowflake credentials.
val user = dbutils.secrets.get("data-warehouse", "<snowflake-user>")
val password = dbutils.secrets.get("data-warehouse", "<snowflake-password>")

val options = Map(
    "sfUrl" -> "<snowflake-url>",
    "sfUser" -> user,
    "sfPassword" -> password,
    "sfDatabase" -> "<snowflake-database>",
    "sfSchema" -> "<snowflake-schema>",
    "sfWarehouse" -> "<snowflake-cluster>"
)

// Generate a simple dataset containing five values and write the dataset to Snowflake.
spark.range(5).write
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .save()

// Read the data written by the previous cell back.
val df: DataFrame = spark.read
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .load()

display(df)

Fastest way to load data into Snowflake is from a file

Collectives™ on Stack Overflow

Writing data into snowflake using Python

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

3 Answers 3

1 Comment

2 Comments

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related