I have two data frames here: df1 is here
+----------+------+---------+--------+------+
| OrgId|ItemId|segmentId|Sequence|Action|
+----------+------+---------+--------+------+
|4295877341| 136| 9| 1| I|!||
|4295877342| 111| 4| 2| I|!||
|4295877343| 138| 2| 1| I|!||
|4295877344| 141| 4| 1| I|!||
|4295877345| 143| 2| 1| I|!||
|4295877346| 145| 14| 1| d|!||
+----------+------+---------+--------+------+
df2 is here:
+----------+------+---------+--------+------+
| OrgId|ItemId|segmentId|Sequence|Action|
+----------+------+---------+--------+------+
|4295877341| 136| 4| 1| I|!||
|4295877342| 136| 4| 1| I|!||
|4295877343| 900| 2| 1| K|!||
|4295877344| 141| 4| 1| D|!||
|4295877345| 111| 2| 1| I|!||
|4295877346| 145| 14| 1| I|!||
|4295877347| 145| 14| 1| I|!||
+----------+------+---------+--------+------+
What i need is only all columns value which is present if df1 not in df2 . Like below ...
4295877341|^|segmentId=9,segmentId=4|^|1|^|I|!|
4295877342|^|ItemId=111,ItemId=136|^|Sequence=2,Sequence=1|^|I|!|
And so on for each row ...
Here OrgId is my primary key for both the dataframe .
So basically for each OrgId i need to collect both versions ,just column changed value .
Here what i have tried so far .
val columns = df1.schema.fields.map(_.name)
val selectiveDifferences = columns.map(col =>
df1.select(col).except(df2.select(col)))
selectiveDifferences.map(diff => {if(diff.count > 0) diff.show})
But it gives me the Except output only with one column at a time .
Regards, Sudarshan
OrgIds in the two dataframes - these won't show up (becauseexceptwould remove X) but they appeared for differentOrgIds.nulls where there was no diff? Or do you want to "merge" all column into one array/map column? Please define the EXACT structure of the desired output.