0

I have a translation table (trans_df):

trans_df <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
                       G           C          G         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C        del        CTT         T          C          C
                       A           C          G         A         C         T          T        CTT         T          C          C
                     del         del        del       del       del       del        del        del       del        del        del
                       G           C          G       del         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         G          C          C
                       G           C          G         A         C         C          T        del         T          C          C
                       A           C          G         A         C         C          T        CTT         T          C          C
                       G           C          A         A         C         C          T        CTT         T          C          C
                       G           C          G         A         C         C          T        CTT         T          C          T
                       G           C          G         A         C         C          T        CTT         T          T          C",header=TRUE, stringsAsFactors = FALSE, colClasses = "character")

and input :

    input <- read.table(text = "rs1065852 rs201377835 rs28371706 rs5030655 rs5030865 rs3892097 rs35742686 rs5030656 rs5030867 rs28371725 rs59421388
+ G|A           C        G|A         A         C       T|C          T  CTT         T        C|T          C", header = TRUE, stringsAsFactors = FALSE, colClasses = "character")

I want to find the input row in the trans_df using regular expression. I have achieved it by position:

Reduce(intersect,lapply(seq(1, ncol(trans_df)), 
                          function(i) {grep(pattern = input[, i], 
                          trans_df[, i])}))

Is there any way to do this where pattern = input? Please advise.

10
  • 2
    Please make sure that you provide reproducible examples so It is easier to help you. Commented Dec 27, 2017 at 10:07
  • @Sotos in this case you have an example and all the data, what is wrong? Commented Dec 27, 2017 at 10:08
  • 1
    Go through the link I gave you. It needs to be reproducible! i.e. something that we can just copy/paste in our R sessions Commented Dec 27, 2017 at 10:08
  • @Sotos give me a second will fix it. Commented Dec 27, 2017 at 10:12
  • 1
    No problem. Take your time. Commented Dec 27, 2017 at 10:13

2 Answers 2

1

You can use Mapto achieve that, i.e.

Map(grep, input, trans_df)

However, that makes the assumption that your columns match one-on-one. If that does not stand, then you can use match to make them the same, i.e.

Map(grep, input[match(names(input), names(trans_df))], trans_df)
#or in the same sense and to keep input intact,
Map(grep, input, trans_df[match(names(trans_df), names(input))])

However, I think that would beat your purpose though.

Sign up to request clarification or add additional context in comments.

Comments

1

I would just use subset() here and pass it the criteria for a matching row. In this case, the criteria involves checking each column in the data frame against a set of known values. Assuming that input is a named vector, we can try the following code:

subset(trans_df, rs1065852 == input["rs1065852"] & rs201377835 == input["rs201377835"] &
       ... & rs59421388 == input["rs59421388"])

2 Comments

If rs1065852 = A or G (="A|G") I need to check in the trans_df in this position if it's A or G. How this can be achieved using your solution?
@Dr.RichardTennen It starts getting ugly now: Use (rs1065852 == 'A' | rs1065852 == 'G') & <other conditions> ... but this is already a departure from your question. Your question implies that you have a vector or maybe data frame of values, one for each column, and you want to extract rows from trans_df using that input.

Your Answer

By clicking “Post Your Answer”, you agree to our terms of service and acknowledge you have read our privacy policy.

Start asking to get answers

Find the answer to your question by asking.

Ask question

Explore related questions

See similar questions with these tags.