5

Can we write data directly into snowflake table without using Snowflake internal stage using Python????

It seems auxiliary task to write in stage first and then transform it and then load it into table. can it done in one step only just like JDBC connection in RDBMS.

3 Answers 3

18

The absolute fastest way to load data into Snowflake is from a file on either internal or external stage. Period. All connectors have the ability to insert the data with standard insert commands, but this will not perform as well. That said, many of the Snowflake drivers are now transparently using PUT/COPY commands to load large data to Snowflake via internal stage. If this is what you are after, then you can leverage the pandas write_pandas command to load data from a pandas dataframe to Snowflake in a single command. Behind the scenes, it will execute the PUT and COPY INTO for you.

https://docs.snowflake.com/en/user-guide/python-connector-api.html#label-python-connector-api-write-pandas

I highly recommend this pattern over INSERT commands in any driver. And I would also recommend transforms be done AFTER loading to Snowflake, not before.

Sign up to request clarification or add additional context in comments.

1 Comment

also, check out the parallel and chunk_size params for write_pandas. Tuning these can have a huge impact on total time to load.
2

If someone is having issues with large datasets. Try Using dask instead and generate your dataframe partitioned into chunks. Then you can use dask.delayed with sqlalchemy. Here we are using snowflake's native connector method i.e. pd_writer, which under the hood uses write_pandas and eventually using PUT COPY with compressed parquet file. At the end it comes down to your I/O bandwidth to be honest. The more throughput you have, the faster it gets loaded in Snowflake Table. But this snippet provides decent amount of parallelism overall.

import functools
from dask.diagnostics import ProgressBar
from snowflake.connector.pandas_tools import pd_writer
import dask.dataframe as dd
df = dd.read_csv(csv_file_path, blocksize='64MB')
ddf_delayed = df.to_sql(
        table_name.lower(),
        uri=str(engine.url),
        schema=schema_name,
        if_exists=if_exists,
        index=False,
        method=functools.partial(
        pd_writer,quote_identifiers=False),
        compute=False,
        parallel=True
    )
with ProgressBar():
    dask.compute(ddf_delayed, scheduler='threads', retries=3)

2 Comments

Thanks for replying ....But again it is using stage only to load data
Yes, but that's what snowflake recommends as it is the fastest way to put your data into a snowflake table. But one could certainly use the sqlalchemy along with dask to_sql method just like pandas.
1

Java:

Load Driver Class:

Class.forName("net.snowflake.client.jdbc.SnowflakeDriver")

Maven:

Add following code block as a dependency

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>snowflake-jdbc</artifactId>
  <version>{version}</version>
</dependency>

Spring:

application.yml:

spring:
  datasource
    hikari:
      maximumPoolSize: 4 # Specify maximum pool size
      minimumIdle: 1 # Specify minimum pool size
      driver-class-name: com.snowflake.client.jdbc.SnowflakeDriver

Python:

import pyodbc

# pyodbc connection string
conn = pyodbc.connect("Driver={SnowflakeDSIIDriver}; Server=XXX.us-east-2.snowflakecomputing.com; Database=VAQUARKHAN_DB; schema=public; UID=username; PWD=password")

#  Cursor
cus=conn.cursor()

# Execute SQL statement to get current datetime and store result in cursor
cus.execute("select current_date;")

# Display the content of cursor
row = cus.fetchone()

print(row)

Apache Spark:

<dependency>
  <groupId>net.snowflake</groupId>
  <artifactId>spark-snowflake_2.11</artifactId>
  <version>2.5.9-spark_2.4</version>
</dependency>

Code

import org.apache.spark.sql.DataFrame

// Use secrets DBUtil to get Snowflake credentials.
val user = dbutils.secrets.get("data-warehouse", "<snowflake-user>")
val password = dbutils.secrets.get("data-warehouse", "<snowflake-password>")

val options = Map(
    "sfUrl" -> "<snowflake-url>",
    "sfUser" -> user,
    "sfPassword" -> password,
    "sfDatabase" -> "<snowflake-database>",
    "sfSchema" -> "<snowflake-schema>",
    "sfWarehouse" -> "<snowflake-cluster>"
)

// Generate a simple dataset containing five values and write the dataset to Snowflake.
spark.range(5).write
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .save()

// Read the data written by the previous cell back.
val df: DataFrame = spark.read
    .format("snowflake")
    .options(options)
    .option("dbtable", "<snowflake-database>")
    .load()

display(df)

Fastest way to load data into Snowflake is from a file


Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.