0

I have a Python application, built with Flask, that allows importing of many data records (anywhere from 10k-250k+ records at one time). Right now it inserts into a Cassandra database, by inserting one record at a time like this:

for transaction in transactions:
    self.transaction_table.insert_record(transaction)

This process is incredibly slow. Is there a best-practice approach I could use to more efficiently insert this bulk data?

1
  • 1
    The obvious thing would be to try bulk insert queries which Cassandra does support. Also preparing a query and reusing it might be actually faster then bulk inserts. But do not expect importing 250k records to be fast. Maybe you should delegate the job to worker(s)? Commented Aug 9, 2016 at 14:06

2 Answers 2

1

You can use batch statements for this, an example and documentation is available from the datastax documentation. You can also use some child workers and/or async queries on top of this.

In terms of best practices, it is more efficient if each batch only contains one partition key. This is because you do not want a node to be used as a coordinator for many different partition keys, it would be faster to contact each individual node directly.

If each record has a different partition key, a single prepared statement with some child workers may work out to be better.

You may also want to consider using a TokenAware load balancing policy allowing the relevant node to be contacted directly, instead of being coordinated through another node.

Sign up to request clarification or add additional context in comments.

Comments

1

The easiest solution is to generate csv files from your data, and import it with the COPY command. That should work well for up to a few million rows. For more complicated scenarios you could use the sstableloader command.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.