1

I want to import a csv file in cassandra using python script. I already know how to import csv in cassandra using the COPY command in the cqlsh.

My csv file with dtypes: timestamp, timestamp, decimal, decimal, decimal, decimal, decimal, decimal

I already have a code, and I am only lacking on how I would import the csv file inside cassandra

KEYSPACE = "test_cassandra"

cluster = Cluster(['127.0.0.1'])
session = cluster.connect()

log.info("creating keyspace . . .")
session.execute("""
    CREATE KEYSPACE IF NOT EXISTS %s 
    WITH replication = {'class': 'SimpleStrategy', 'replication_factor': '1'} 
    """ % KEYSPACE)

log.info("setting keyspace . . .")
session.set_keyspace(KEYSPACE)

log.info("creating table . . .")
session.execute("""
    CREATE TABLE IF NOT EXISTS test_NYC_taxi (
        pickup timestamp, 
        dropoff timestamp, 
        distance decimal, 
        fare decimal, 
        p_long decimal, 
        p_lat decimal, 
        d_long decimal, 
        d_lat decimal, 
        PRIMARY KEY(pickup, dropoff, distance));
    """)



prepared = session.prepare("""
        INSERT INTO test_nyc_taxi (pickup, dropoff, distance, fare, p_long, p_lat, d_long, d_lat)
        VALUES (?, ?, ?, ?, ?, ?, ?, ?)
        """)


Any help would be very much appreciated. :)

2 Answers 2

1

You're on the right path. The only things you need to add, is a file reader (I saved your data as a .csv file and got it to work), a for-loop to iterate through it, and then an import of the datetime package to convert your dates for the prepared statement.

This worked for me:

with open("/home/aaron/Documents/stackoverflow/test_NYC_taxi.csv", "r") as fares:
    for fare in fares:
        columns=fare.split(",")
        pickup=datetime.datetime.strptime(columns[0],"%Y-%m-%d").date()
        dropoff=datetime.datetime.strptime(columns[1],"%Y-%m-%d").date()
        distance=columns[2]
        fare=columns[3]
        p_long=columns[4]
        p_lat=columns[5]
        d_long=columns[6]
        d_lat=columns[7]

        session.execute(prepared, [pickup,dropoff,distance,fare,p_long,p_lat,d_long,d_lat])

#closing the file
fares.close()

#closing Cassandra connection
session.shutdown()
Sign up to request clarification or add additional context in comments.

1 Comment

Wow, thank you! I'd had a hard time figuring out how to convert those dates.
0

Open the csv file reader in python code. Iterate over the lines and call the insert statement. Please check below link.

https://stackoverflow.com/a/19828117/1358551

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.