there are two pandas DataFrames:
df1 = pd.DataFrame({
'name': ['ann', 'maxim', 'ann', 'maxim'],
'surname': [ 'smith', 'shwarz','smith', 'shwarz'],
'date': ['2020.01.01', '2020.01.01', '2020.03.05','2020.03.05'],
'mark_1': [None,'B', 'A', None],
'mark_2': [None,'B', None,'A'],
'mark_3': [None,None, 'A', 'C']
})
| name | surname | date | mark_1 | mark_2 | mark_3 |
|---|---|---|---|---|---|
| ann | smith | 2020.01.01 | None | None | None |
| maxim | shwarz | 2020.01.01 | B | B | None |
| ann | smith | 2020.03.05 | A | None | A |
| maxim | shwarz | 2020.03.05 | None | A | C |
df2 = pd.DataFrame({
'name': ['ann', 'maxim'],
'surname': [ 'smith', 'shwarz'],
'mark_1': ['Z','X'],
'mark_2': ['H','F'],
'mark_3': ['P','Y']
})
| name | surname | mark_1 | mark_2 | mark_3 |
|---|---|---|---|---|
| ann | smith | Z | H | P |
| maxim | shwarz | X | F | Y |
I need:
| name | surname | date | mark_1 | mark_2 | mark_3 |
|---|---|---|---|---|---|
| ann | smith | 2020.01.01 | Z | H | P |
| maxim | shwarz | 2020.01.01 | B | B | Y |
| ann | smith | 2020.03.05 | A | H | A |
| maxim | shwarz | 2020.03.05 | X | A | C |
But functiondf1.isnull(df2) replaces only first rows with the similar names and surnames:
| name | surname | date | mark_1 | mark_2 | mark_3 |
|---|---|---|---|---|---|
| ann | smith | 2020.01.01 | Z | H | P |
| maxim | shwarz | 2020.01.01 | B | B | Y |
| ann | smith | 2020.03.05 | A | None | A |
| maxim | shwarz | 2020.03.05 | None | A | C |
As I understand, it should be something like CASE statement from SQL, but I can't find the answer.
Special respect if you can explain the same function for two PySpark DataFrames!