1

Greetings and thanks in advance for all help I have many data frame that resemble the ones below

df1

   name info
1  john    A
2   jim    B
3   tom    B
4 bill     B

dframe

  name other
1  sam   pro
2  dad   mo1
3  mom  Bxxx

frame3

   name otherinfo
1   jus         A
2    do         7
3 r pro         B
4   sir         B
5  real        na
6  pete       yes

OLFrame

   name information
1  ally          x1
2   mom          B9
3 r pro         s3B
4   tom         Bd0
5 kelly          ot
6  jojo         who
7    na          11

I would like to :

  1. take each name from the "name" column of dataframe "OLFrame" and look into the "name"column of "df1" to see if the name exists
  2. create column vector with named "df1" consisting of "1" if name from "OLFrame" exist in "df1" if not "0"
  3. repeat the steps 1 and 2 but using "dframe" and "frame3"
  4. create a new data frame called "newOLFrame" consisting of "OLFrame" and and new columns named "df1", "dframe" and "frame3"

The desired result should look like

newOLFrame

   name information df1 dframe frame3
1  ally          x1   0      0      0
2   mom          B9   0      1      0
3 r pro         s3B   0      0      1
4   tom         Bd0   1      0      0
5 kelly          ot   0      0      0
6  jojo         who   0      0      0
7    na          11   0      0      0

I can do one at a time (below) but I have over a hundred files to look through

newOLFrame<-OLFrame
newOLFrame[,"pro1"]<-ifelse(newOLFrame$name %in% df12$name, 1, 0)

Please help. Thanks again

1 Answer 1

3

Consider an extended chain merge by first building a list of data frames, iteratively left joined to OLFrame then chain merge all together at end with Reduce:

df_list <- lapply(c("df1", "dframe", "frame3"), function(nm) {      
  df <- get(nm)
  df[[nm]] <- 1

  df <- merge(OLFrame, df[c("name", nm)], by="name", all.x=TRUE) 
  df[[nm]] = ifelse(is.na(df[[nm]]), 0, 1)

  return(df)
})

# MERGE ALL DFs
final_df <- Reduce(function(x, y) merge(x, y, by=c("name", "information")), df_list)
final_df
#    name information df1 dframe frame3
# 1  ally          x1   0      0      0
# 2  jojo         who   0      0      0
# 3 kelly          ot   0      0      0
# 4   mom          B9   0      1      0
# 5    na          11   0      0      0
# 6 r pro         s3B   0      0      1
# 7   tom         Bd0   1      0      0

Alternatively, consider a do.call as Reduce can have performance issues for large lists where you order data frame and then subset out only the needed column to column bind all data frame items at end:

df_list <- lapply(c("df1", "dframe", "frame3"), function(nm) {

  df <- get(nm)
  df[[nm]] <- 1

  df <- merge(OLFrame, df[c("name", nm)], by="name", all.x=TRUE, sort=FALSE) 
  df[[nm]] = ifelse(is.na(df[[nm]]), 0, 1)

  df <- with(df, df[order(name, information),])        # ORDER DATA FRAME
  small_df <- setNames(as.data.frame(df[[nm]]), nm)    # SUBSET ONE COLUMN

  return(small_df)
})

# ORDER DATA FRAME
OLFrame <- with(OLFrame, OLFrame[order(name, information),])

final_df <- do.call(cbind, c(OLFrame, df_list))
final_df

#    name information df1 dframe frame3
# 1  ally          x1   0      0      0
# 2  jojo         who   0      0      0
# 3 kelly          ot   0      0      0
# 4   mom          B9   0      1      0
# 5    na          11   0      0      0
# 6 r pro         s3B   0      0      1
# 7   tom         Bd0   1      0      0
Sign up to request clarification or add additional context in comments.

8 Comments

Thank you. How should the code be tweaked if the "name" column was two columns "firstname" , "lastname"?
Replace name with those two as additional items in df column selection and by argument.
I did that and it worked well, thanks. On large files it is slow; can anything be done to make it faster. Its given memory error, other than reducing the amount of files in the environment; can anything else be done?
Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 91674, 843779, 52800
The results from the various files are of differing lengths and this hinders the final merge: " final_df <- do.call(cbind, c(OLFrame, df_list))" it give a message "Error in data.frame(..., check.names = FALSE) : arguments imply differing number of rows: 91674, 843779, 52800" . Please help
|

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.