How to insert json response data in snowflake database more efficiently?

Question

I am currently looping through a json response and inserting each row one by one.

This is very slow even for a few thousand rows of data insert.

What is the most efficient way to insert the data?

Here is my code.

from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime

end_point = 'users'

def snowflake_connect():
    global cursor, mydb
    mydb = snowflake.connector.connect(
        user=usr,
        password=pwd,
        account=acct,
        database=db,
        schema=schem,
    )

def snowflake_insert(id, activated, name):
    global cursor
    snowflake_connect()
    cursor = mydb.cursor()
    sql_insert_query = """ INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (%s, %s, %s)"""
    insert_tuple = (id, activated, name)
    cursor.execute(sql_insert_query, insert_tuple)
    return cursor

def get_users():
    url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
    response = requests.request("GET", url).json()
    read_users(response)

def read_users(response):   
    for data in response['data']:
        id = data['id']
        activated = data['activated']
        name = data['name']     
        snowflake_insert(id, activated, name)

if __name__ == "__main__":  
    snowflake_truncate()
    get_users()
cursor.close()

Check out the COPY INTO command. Here's a thread from the Snowflake support forums about this exact process. support.snowflake.net/s/question/0D50Z000090I9u4SAC/… — R. Arctor
– R. Arctor, Commented Jan 24, 2020 at 18:42
Exactly, staging the json files then copying them into the table could save you alot of time. The loading semi-structured data reference in the documentation is another resource I would recommend. docs.snowflake.net/manuals/user-guide/… and particularly FLATTEN and Variants docs.snowflake.net/manuals/user-guide/… — Rachel McGuigan
– Rachel McGuigan, Commented Jan 24, 2020 at 19:14

Harsh J · Accepted Answer · 2020-01-26 03:13:35Z

As noted by others in comments, to get the highest efficiency, especially for a continual load, load your formatted data files directly into Snowflake instead of using INSERT statements as a best-practice.

However, the code in description can also be further improved to minimise overheads created per inserted row. A few key observations:

The code is creating a new connection object every insert, this is unnecessary.
Since the intention is to only run INSERT statements that do not need isolation, the cursor object can be reused.
Multiple values can be sent per INSERT statement, using the multi-value feature, or expressed in a simpler way using cursor.executemany(…).
Switch to using qmark (?) parameter formatting to avoid potential SQL injection.

A modified code version:

from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime

end_point = 'users'
MYDB = None

def snowflake_connect():
    if MYDB is None:
        MYDB = snowflake.connector.connect(
            user=usr,
            password=pwd,
            account=acct,
            database=db,
            schema=schem,
        )

def snowflake_insert_all(rows):
    snowflake_connect()
    cursor = MYDB.cursor()
    sql_insert_query = "INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (?, ?, ?)"
    cursor.executemany(sql_insert_query, rows)
    return cursor

def get_users():
    url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
    response = requests.request("GET", url).json()
    read_users(response)

def read_users(response):
    # 
    all_data = [(data['id'], data['activated'], data['name']) for data in response['data']]
    snowflake_insert_all(all_data)

if __name__ == "__main__":  
    snowflake_truncate()
    get_users()
    if MYDB is not None:
      MYDB.close()

Note: I've only focussed on improving the Snowflake and DB-API interaction portions here but in general there's other faults (variable and method naming, unnecessary use of globals, resource handling, etc.) in the way this script is written that can use help from Code Review, if you're looking to improve your program further.

Collectives™ on Stack Overflow

How to insert json response data in snowflake database more efficiently?

1 Answer 1

Comments

Your Answer

Linked

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Linked

Related