Python Pandas Dataframe, merge df rows while skipping overlapping rows

Question

The title doesn't make a clear sense, but it was the best I could describe.

I have DataFrames that look like these:

data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T

data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)

>>> data1
     0   1
0    4   4
1   75   6
2    2  67
3    5   6
4   84   4
5    2   5
6    6   8
7    5   5
8  554   8


>>> data2
     0  1
0    3  1
1   46  2
2    4  4
3  555  1
4  556  3
5  557  5
6  558  3
7  559  1
8  560  5

I want to append data2 to the bottom of data1. However, I want to append rows of data2 whose column1 values are greater than or equal to the 554, which is the last row of column1 of data1

Here is the output I want:

>>> merged_df
         0   1
    0    4   4
    1   75   6
    2    2  67
    3    5   6
    4   84   4
    5    2   5
    6    6   8
    7    5   5
    8  554   8
    9  555   1
   10  556   3
   11  557   5
   12  558   3
   13  559   1
   14  560   5

So, some of the first rows of data2 will be skipped when appending to data1.

It is assumed that the last row of data1 is its maximum, and the rows of data2 are sorted after the last value of data1, which is 554

Is there any elegant way to do this job using Pandas toolbox?

jezrael · Accepted Answer · 2018-04-07 09:40:18Z

3

Use concat with filter second DataFrame by boolean indexing:

print (data1.iloc[-1, 0])
554

df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
      0   1
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

Also for general solution compare by max value:

df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)

Solution if custom columns names:

data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))

print (data1.iloc[-1, data1.columns.get_loc('a')])
554

data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
      a   b
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

edited Apr 7, 2018 at 9:40

answered Apr 7, 2018 at 9:17

jezrael

868k103 gold badges1.4k silver badges1.3k bronze badges

Sign up to request clarification or add additional context in comments.

1 Comment

jezrael Over a year ago

@EricKim - Glad can help!

Collectives™ on Stack Overflow

Python Pandas Dataframe, merge df rows while skipping overlapping rows

1 Answer 1

1 Comment

Your Answer

Hot Network Questions

Collectives™ on Stack Overflow

1 Answer 1

1 Comment

Your Answer

Sign up or log in

Post as a guest

Related