3

The title doesn't make a clear sense, but it was the best I could describe.

I have DataFrames that look like these:

data1 = np.matrix([[4,75,2,5,84,2,6,5,554],[4,6,67,6,4,5,8,5,8]]).T
data2 = np.matrix([[3,46,4,555,556,557,558,559,560],[1,2,4,1,3,5,3,1,5]]).T

data1 = pd.DataFrame(data1)
data2 = pd.DataFrame(data2)

>>> data1
     0   1
0    4   4
1   75   6
2    2  67
3    5   6
4   84   4
5    2   5
6    6   8
7    5   5
8  554   8


>>> data2
     0  1
0    3  1
1   46  2
2    4  4
3  555  1
4  556  3
5  557  5
6  558  3
7  559  1
8  560  5

I want to append data2 to the bottom of data1. However, I want to append rows of data2 whose column1 values are greater than or equal to the 554, which is the last row of column1 of data1

Here is the output I want:

>>> merged_df
         0   1
    0    4   4
    1   75   6
    2    2  67
    3    5   6
    4   84   4
    5    2   5
    6    6   8
    7    5   5
    8  554   8
    9  555   1
   10  556   3
   11  557   5
   12  558   3
   13  559   1
   14  560   5

So, some of the first rows of data2 will be skipped when appending to data1.

It is assumed that the last row of data1 is its maximum, and the rows of data2 are sorted after the last value of data1, which is 554

Is there any elegant way to do this job using Pandas toolbox?

1 Answer 1

3

Use concat with filter second DataFrame by boolean indexing:

print (data1.iloc[-1, 0])
554

df = pd.concat([data1, data2[data2[0] > data1.iloc[-1, 0]]], ignore_index=True)
print (df)
      0   1
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5

Also for general solution compare by max value:

df = pd.concat([data1, data2[data2[0] > data1[0].max()]], ignore_index=True)

Solution if custom columns names:

data1 = pd.DataFrame(data1, columns=list('ab'))
data2 = pd.DataFrame(data2, columns=list('ab'))

print (data1.iloc[-1, data1.columns.get_loc('a')])
554

data22 = data2[data2['a'] > data1.iloc[-1, data1.columns.get_loc('a')]]
df = pd.concat([data1, data22], ignore_index=True)
print (df)
      a   b
0     4   4
1    75   6
2     2  67
3     5   6
4    84   4
5     2   5
6     6   8
7     5   5
8   554   8
9   555   1
10  556   3
11  557   5
12  558   3
13  559   1
14  560   5
Sign up to request clarification or add additional context in comments.

1 Comment

@EricKim - Glad can help!

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.