1

I am currently looping through a json response and inserting each row one by one.

This is very slow even for a few thousand rows of data insert.

What is the most efficient way to insert the data?

Here is my code.

from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime

end_point = 'users'

def snowflake_connect():
    global cursor, mydb
    mydb = snowflake.connector.connect(
        user=usr,
        password=pwd,
        account=acct,
        database=db,
        schema=schem,
    )

def snowflake_insert(id, activated, name):
    global cursor
    snowflake_connect()
    cursor = mydb.cursor()
    sql_insert_query = """ INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (%s, %s, %s)"""
    insert_tuple = (id, activated, name)
    cursor.execute(sql_insert_query, insert_tuple)
    return cursor

def get_users():
    url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
    response = requests.request("GET", url).json()
    read_users(response)

def read_users(response):   
    for data in response['data']:
        id = data['id']
        activated = data['activated']
        name = data['name']     
        snowflake_insert(id, activated, name)

if __name__ == "__main__":  
    snowflake_truncate()
    get_users()
cursor.close()
2

1 Answer 1

2

As noted by others in comments, to get the highest efficiency, especially for a continual load, load your formatted data files directly into Snowflake instead of using INSERT statements as a best-practice.

However, the code in description can also be further improved to minimise overheads created per inserted row. A few key observations:

A modified code version:

from module import usr, pwd, acct, db, schem, api_key
import snowflake.connector
import datetime

end_point = 'users'
MYDB = None

def snowflake_connect():
    if MYDB is None:
        MYDB = snowflake.connector.connect(
            user=usr,
            password=pwd,
            account=acct,
            database=db,
            schema=schem,
        )

def snowflake_insert_all(rows):
    snowflake_connect()
    cursor = MYDB.cursor()
    sql_insert_query = "INSERT INTO USERS(ID, ACTIVATED, NAME) VALUES (?, ?, ?)"
    cursor.executemany(sql_insert_query, rows)
    return cursor

def get_users():
    url = 'https://company.pipedrive.com/v1/{}?&api_token={}'.format(end_point,api_key)
    response = requests.request("GET", url).json()
    read_users(response)

def read_users(response):
    # 
    all_data = [(data['id'], data['activated'], data['name']) for data in response['data']]
    snowflake_insert_all(all_data)

if __name__ == "__main__":  
    snowflake_truncate()
    get_users()
    if MYDB is not None:
      MYDB.close()

Note: I've only focussed on improving the Snowflake and DB-API interaction portions here but in general there's other faults (variable and method naming, unnecessary use of globals, resource handling, etc.) in the way this script is written that can use help from Code Review, if you're looking to improve your program further.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.