0

I have two Scala DataFrames which I am testing for similarities. I want to be able to pick a specific row number, and compare each value of that row between the two DataFrames. For example:

Dataframe 1: df1

+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob  | 12  |   Blue    |
| Bil  | 17  |   Red     |
| Ron  | 13  |   Brown   |
+------+-----+-----------+

Dataframe 2: df2

+------+-----+-----------+
| Name | Age | Eye Color |
+------+-----+-----------+
| Bob  | 12  |   Blue    |
| Bil  | 14  |   Blue    |
| Ron  | 13  |   Brown   |
+------+-----+-----------+

Input: Row 2, output: Age, Eye Color.

What would be ideal, is for the output to show the values that are different too. I have considered the option here but the issue is that my DataFrames are very large (in excess of 200,000 rows) so this takes far too long. Is there a simpler way to select a specific row value of a Dataframe in Scala?

6
  • The outcome in the sample you have given compares two rows based on Name property. Is that what you want to do? Or you strictly want to give your program a row number? Commented Oct 22, 2020 at 16:44
  • 1
    zipWithIndex is the only way you can get continuous incrementing values across 2 different DFs. It should have worked though as it is parallelised. Commented Oct 22, 2020 at 17:03
  • 1
    Secondly, your usecase of comparing 2 rows of 2 different dataframes makes sense, only if you are sorting both dataframes first by some common column. Commented Oct 22, 2020 at 17:04
  • @jrook I want to strictly give the program a row number as I need to compare all fields in that row Commented Oct 23, 2020 at 8:47
  • @Sanket9394 Both databases are sorted and should be identical so that shouldn't be an issue. I will try using zipWithIndex and see how long it takes. Thanks Commented Oct 23, 2020 at 8:48

0

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.