0

We have 7 node Cassandra 3.11.3 production cluster, we get ticket details dump to a mid server, I need to read from this .csv file and import .csv data to cassandra table. I tried ruby code which was easy for me to write but it does not take care of all the column values (As this .csv will have special characters, enters/different lines, UTF issues, too much of text description as it is in ticketing tool) as data keep changing in each and every row in .csv.

I Want to know if ruby or python is good to perform this activity in production or does anyone have good sample code for mitigating issues mentioned above and performing this kind of activity in production environment?

1
  • 2
    Please provide a snippet of example data AND your code - without those, no one can answer your question. Also, you've asked TWO questions: 1) how-to/example code and 2) is python or ruby good enough? Please edit your post to ask only one question. Commented May 9, 2019 at 13:07

1 Answer 1

1

Both Ruby and Python are perfect for this kind of task, but if your source file is in bad format then any potential tool could fail - there is no magic button tool that could deduce the context from the (broken) data file and fix all the problems for you automatically.

I'd suggest splitting the task into two parts: 1) fix the encoding and data quality problem(s) (and perform any data transformations if necessary) and then 2) import clean data.

Task 2 could be easily done with almost any programming language (that has appropriate cassandra driver available) but if you have a well-formatted csv source you probably don't need any hacking at all (depending on the use case, of course) - Cassandra supports copy ... from command that allows importing data from csv directly (https://docs.datastax.com/en/cql/3.3/cql/cql_reference/cqlshCopy.html).

Sign up to request clarification or add additional context in comments.

1 Comment

I thank your for sharing above information... I have written ruby code to complete my activity and also used copy command in cqlsh. Wanted to know if I can get some sample code to improve my code as mentioned we have good quality data which is coming in from ticketing system (In this case it is Service Now). In some cases I can see that data for all the columns are not available and I need to take care in the code... so my difficulties are "timestamp" as it does not work in ruby code. So I was looking for some best practices followed to mitigate this. I can share my code if needed.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.