0

I have two data frame df1 and df2

df1 has following data (N Rows)

  Time(s)   sv-01 sv-02 sv-03  Val1  val2  val3 
   1339.4   1     4      12     1.6   0.6   1.3
   1340.4   1     12     4     -0.5  0.5   1.4
   1341.4   1     6      8      0.4   5     1.6
   1342.4   2     5     14      1.2   3.9    11
   ......           .....      ....   ...    ..

df2 has following data which has more rows than df1

Time(msec)   channel  svid    value-1   value-2 valu-03
1000          1       2       0          5       1
1000          2       5       1          4       2
1000          3       2       3          4       7
.....         .....................................
1339400      1        1       1.6        0.4     5.3
1339400      2        12      0.5       1.8      -4.4
1339400      3        4       -0.20     1.6      -7.9
1340400      1        1      0.3       0.3      1.5
1340400      2        6      2.3      -4.3      1.0
1340400      3        4      2.0       1.1      -0.45
1341400      1       1       2         2.1      0
1341400      2       8       3.4      -0.3       1
1341400      3       6       0         4.1      2.3
....       ....      ..      ...        ...      ...

What I am trying to achieve is

1.first multiplying Time(s) column by 1000 so that it matches with df2 millisecond column.

2.In df1 sv 01,02 and 03 are in independent column but those sv are present in same column under svid.

So goal is when time of df1(after changing) is matching with time of df2 copy next three consecutive lines i.e copy all matched lines of that time instant.

Basically I want to iterate the time of df1 in df2 time column and if there is a match copy three next rows and copy to a new df.

I have seen examples using pandas merge function but in my case both have different header.

Thanks.

1 Answer 1

1

I think you need double boolean indexing - first df2 with isin, for multiple is used mul:

And then count values per groups by cumcount and filter first 3:

df = df2[df2['Time(msec)'].isin(df1['Time(s)'].mul(1000))]
df = df[df.groupby('Time(msec)').cumcount() < 3]
print (df)
    Time(msec)  channel  svid  value-1  value-2  valu-03
3      1339400        1     1      1.6      0.4     5.30
4      1339400        2    12      0.5      1.8    -4.40
5      1339400        3     4     -0.2      1.6    -7.90
6      1340400        1     1      0.3      0.3     1.50
7      1340400        2     6      2.3     -4.3     1.00
8      1340400        3     4      2.0      1.1    -0.45
9      1341400        1     1      2.0      2.1     0.00
10     1341400        2     8      3.4     -0.3     1.00
11     1341400        3     6      0.0      4.1     2.30

Detail:

print (df.groupby('Time(msec)').cumcount())
3     0
4     1
5     2
6     0
7     1
8     2
9     0
10    1
11    2
dtype: int64
Sign up to request clarification or add additional context in comments.

6 Comments

@Poka - yes, no problem ;)
Can you check df = df2[df2['Time(msec)'].isin(df1['Time(s)'].round(1).mul(1000).astype(int))] ?
Hmm, then the best is convert column in df1 to ms for match data and then use df = pd.merge(df1, df, left_on='Time(s)', right_on='Time(msec)', how='right').
@ Jezrael. df2 has all time instant of df1. So'' df = df2[df2['Time(msec)'].isin(df1['Time(s)'].mul(1000))]'' shall give total number of df1 time instant but is outputting less than df1 number. It is jumping from 59 to 63 after some iteration as shown 5455 32945600 5457 32945800 5459 32946000 5463 32946400 . It is not taking 3294200 when both dataframe has this value
There is not problem with floats accuracy, so no match?
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.