0

I have a problem inserting big list into Cassandra using python. I have a list of 3200 string that I want to save in Cassandra:

CREATE TABLE IF NOT EXISTS my_test (
                id bigint PRIMARY KEY,
                list_strings list<text>
            );

When I'm reducing my list I have no problem. It works.

prepared_statement = session.prepare("INSERT INTO my_test (id, list_strings) VALUES (?, ?)")
        session.execute(prepared_statement, [id, strings[:5]])

But if I keep the totality of my list I have an error:

Error from server: code=1500 [Replica(s) failed to execute write] message="Operation failed - received 0 responses and 1 failures" info={'required_responses': 1, 'consistency': 'LOCAL_ONE', 'received_responses': 0, 'failures': 1}

How can I insert big list into Cassandra?

6
  • The concept of inserting a 3.200 strings list into a single cell in Cassandra may be wrong. Why don't you insert each string as a different row? Commented Mar 13, 2017 at 14:30
  • So my model is wrong? I should just insert: id string1 id string2 ... Commented Mar 13, 2017 at 14:32
  • Probably, without knowing more info about this table I can't be 100% sure. Could you tell me some context so that I can help you more? Commented Mar 13, 2017 at 14:33
  • Well the Id is a user on twitter and the list of string is the list of their tweet (JSON) Commented Mar 13, 2017 at 14:34
  • Okey then, IMO, you should use 2 primary key columns. The first one should still be the user id and should be a partition key. The second one should be a clustering key containing either an integer (the size would depend on the data) or a timestamp if this gives you any kind of information and you have it available with enough precission so that no tweet would be in the same timestamp that other for the same user. The last column, outside of the primary key, would be a plain text column where you store one tweet. Are you familiar with partition and clustering keys? Commented Mar 13, 2017 at 14:39

1 Answer 1

3

A DB array type is not supossed to hold that ammount of data. Using different rows of the table to store each string would be better:

    id     |    time    | strings
-----------+------------+---------
  bigint   | timestamp  | string
 partition | clustering |

Using id as the clustering key would be a bad solution as when requesting all the tweets from a user id, it will require to do a read in multiple nodes while when used as a partition key it will only require to read in one node per user.

Sign up to request clarification or add additional context in comments.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.