0

I have 2 data frames of equal column size of 5. The first 4 column names are the same and the last column is different. I report the value (T) in the last column indicating that there is an outlier for each of average & sigma in respective data frames.

My first data frame - df1

    TimeStamp <- c("2015-04-01 11:40:13", "2015-04-03 02:54:45")
    ID <- c("DL1X8", "DL202")
    Avg <- c(38.1517, 0.7131)
    Sig <- c(11.45880000, 0.01257816)
    Outlier_Avg <- c("T","T")
    df1 <- data.frame(TimeStamp, ID, Avg, Sig,Outlier_Avg)


    +---------------------+-------+---------+-------------+-------------+
    |      TimeStamp      |  ID   |   Avg   |     Sig     | Outlier_Avg |
    +---------------------+-------+---------+-------------+-------------+
    | 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | T           |
    | 2015-04-03 02:54:45 | DL202 | 0.7131  | 0.01257816  | T           |
    +---------------------+-------+---------+-------------+-------------+

My Second data frame - df2

TimeStamp <- c("2015-04-01 11:40:13", "2015-04-04 02:57:45", "2015-04-06 09:54:45")
ID <- c("DL1X8", "DP308","DM3X8")
Avg <- c(38.1517, 24.7131, 0.0234)
Sig <- c(11.4588, 6.0175,0.0665)
Outlier_Sig <- c("T","T","T")
df2 <- data.frame(TimeStamp, ID, Avg, Sig,Outlier_Sig)
+---------------------+-------+---------+---------+-------------+
|      TimeStamp      |  ID   |   Avg   |   Sig   | Outlier_Sig |
+---------------------+-------+---------+---------+-------------+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.4588 | T           |
| 2015-04-04 02:57:45 | DP308 | 24.7131 | 6.0175  | T           |
| 2015-04-06 09:54:45 | DM3X8 | 0.0234  | 0.0665  | T           |
+---------------------+-------+---------+---------+-------------+

Desired Output:

I am trying to get a df3 that looks like this

+---------------------+-------+---------+-------------+-------------+-------------+
|      TimeStamp      |  ID   |   Avg   |     Sig     | Outlier_Avg | Outlier_Sig |
+---------------------+-------+---------+-------------+-------------+-------------+
| 2015-04-01 11:40:13 | DL1X8 | 38.1517 | 11.45880000 | T           | T           |
| 2015-04-03 02:54:45 | DL202 | 0.7131  | 0.01257816  | T           | N/A         |
| 2015-04-04 02:57:45 | DP308 | 24.7131 | 6.0175      | N/A         | T           |
| 2015-04-06 09:54:45 | DM3X8 | 0.0234  | 0.0665      | N/A         | T           |
+---------------------+-------+---------+-------------+-------------+-------------+

I tried using merge(df1,df2). It returns only the rows that are matched and hence only 1 row is returned. I need to return all the rows and put N/A as shown above. Could you kindly help me on this?

0

1 Answer 1

3

Use the all argument:

merge(df1, df2, all = TRUE)
#             TimeStamp    ID     Avg         Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.45880000           T           T
# 2 2015-04-03 02:54:45 DL202  0.7131  0.01257816           T        <NA>
# 3 2015-04-04 02:57:45 DP308 24.7131  6.01750000        <NA>           T
# 4 2015-04-06 09:54:45 DM3X8  0.0234  0.06650000        <NA>           T

This is shorthand for using all.x = TRUE and all.y = TRUE, which are separate arguments that let you control which observations from x (df1 in your case) and y (df2 in your case) are included in the merged data.frame. See, for example:

merge(df1, df2, all.x = TRUE)
#             TimeStamp    ID     Avg         Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.45880000           T           T
# 2 2015-04-03 02:54:45 DL202  0.7131  0.01257816           T        <NA>

merge(df1, df2, all.y = TRUE)
#             TimeStamp    ID     Avg     Sig Outlier_Avg Outlier_Sig
# 1 2015-04-01 11:40:13 DL1X8 38.1517 11.4588           T           T
# 2 2015-04-04 02:57:45 DP308 24.7131  6.0175        <NA>           T
# 3 2015-04-06 09:54:45 DM3X8  0.0234  0.0665        <NA>           T
Sign up to request clarification or add additional context in comments.

2 Comments

Thanks Thomas!!. This is exactly what I wanted. all.x & all.y is useful info.
@Thomas Can you help me with this question [stackoverflow.com/questions/35484595/…

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.