0

Here is a my scenario i have a data in hive warehouse and i want to export this data into a table named "sample" of "test" database in mysql. What happens if one column is primary key in sample.test and and the data in hive(which we are exporting) is having duplicate values under that key ,then obviously the job will fail , so how could i handle this kind of scenario ?

Thanks in Advance

1
  • you will need to remove duplicates in that case..... or if you need further help., paste schema and sample records. Commented Jan 13, 2015 at 8:51

2 Answers 2

2

If you want your mysql table to contain only the last row among the duplicates, you can use the following:

sqoop export --connect jdbc:mysql://<*ip*>/test -table sample --username root -P --export-dir /user/hive/warehouse/sample --update-key <*primary key column*> --update-mode allowinsert

While exporting, Sqoop converts each row into an insert statement by default. By specifying --update-key, each row can be converted into an update statement. However, if a particular row is not present for update, the row is skipped by default. This can be overridden by using --update-mode allowinsert, which allows such rows to be converted into insert statements.

Sign up to request clarification or add additional context in comments.

Comments

0

Beforing performing export operation ,massage your data by removing duplicates from primary key. Take distinct on that primary column and then export to mysql.

Comments

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.