Python script not responding for large volume Cassandra queries

Question

When I ran the python script for getting the data from Cassandra using

SELECT * FROM my_keyspace LIMIT 5000000;

using a limit of 5 million, then the records are shown up after a processing time of 22 minutes. But when I set the limit to 10 million records and fire the query, the script keeps waiting for a very long time and I am yet to receive a response. What could be the issue?

Mikhail Baksheev · Accepted Answer · 2016-04-20 09:08:49Z

1

You didn't specify partition key in you query so coordinator node should request all nodes to get data. Also the coordinator will collect all millions rows before pass result to you python script and it can cause a lot of garbage collection invocations on the coordinator.

You should avoid queries without partition keys to don't have performance issues.

Please check out Cassandra Read Path for more details.

answered Apr 20, 2016 at 9:08

Mikhail Baksheev

1,42412 silver badges15 bronze badges

Sign up to request clarification or add additional context in comments.

Collectives™ on Stack Overflow

Python script not responding for large volume Cassandra queries

1 Answer 1

Comments

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

Comments

Your Answer

Sign up or log in

Post as a guest

Related